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Preface 



The tenth Portuguese Conference on Artificial Intelligence, EPIA 2001 was held 
in Porto and continued the tradition of previous conferences in the series. It 
returned to the city in which the first conference took place, about 15 years ago. 
The conference was organized, as usual, under the auspices of the Portuguese 
Association for Artificial Intelligence (APPIA, http://www.appia.pt). EPIA 
maintained its international character and continued to provide a forum for pre- 
senting and discussing researc h on different aspects of Artificial Intelligence. 

To promote motivated discussions among participants, this conference strength- 
ened the role of the thematic workshops. These were not just satellite events, 
but rather formed an integral part of the conference, with joint sessions when 
justified. This had the advantage that the work was presented to a motivated 
audience. This was the first time that EPIA embarked on this experience and 
so provided us with additional challenges. 

A word of appreciation must be given to those who actively promoted and or- 
ganized each of the thematic workshops: 

— Fernando Moura Pires and Gabriela Guimaraes, for organizing the workshop 
on Extraction of Knowledge from Data Bases (EKDB 2001); 

— Eugenio Oliveira, for organizing the workshop on Multi Agent Systems: The- 
ory and Applications (MASTA 2001); 

— Jose Alferes and Salvador Abreu, for organizing the workshop on Logic Pro- 
gramming for Artificial Intelligence and Information Systems (LPAI); 

— Pedro Barahona, for organizing the workshop on Gonstraint Satisfaction and 
Operational Research Techniques for Problem Solving (GSOR); 

— Luis Torgo, for organizing the workshop on Artificial Intelligence Techniques 
for Financial Time Series Analysis (AIFTSA); 

The organization of this volume reflects the thematic threads described above 
with some exceptions. 

We would also like to thank all the authors who submitted the 88 articles to the 
conference. From these, 21 (24%) were selected as long papers for this volume. 
A further 18 (about 20%) were accepted as short papers. 

Regarding the geographical origin, the first authors of submitted papers come 
from Portugal (32), Spain (17), Brazil (7), France (5), Slovenia (4), Korea (4), 
USA (3), Germany (3), UK (3), Sweden (3), Austria (2), and also Ghile, The 
Netherlands, Turkey, Australia, and Poland with one submission each. 

In terms of scientific areas, there were 27 submissions on knowledge extraction 
(10 accepted for this volume), 16 on multi-agents (9 accepted), 12 on logic pro- 
gramming (6 accepted), 10 on constraint solving (3 accepted), 8 on financial 
time series (2 accepted), and 15 on other areas (7 accepted). 

Some articles not included in this volume were selected for oral presentation 
at the conference and/or at the thematic workshops. More information can be 
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found at the conference site (http://www.niaad.liacc.up.pt/Events/EPIA01/). 
Special thanks to all members of the Program Committee who took upon most 
of the burden of reviewing. Thanks also to all other reviewers for their commit- 
ment and collaboration. 

We would also like to gratefully acknowledge the support of the Portuguese Gov- 
ernment through FCT (Fundagao da Ciencia e Tecnologia), University of Porto, 
Faculty of Economics of Porto, LIACC (Laboratorio de Inteligencia Artificial 
e Ciencia de Computadores), the company Enabler, and other sponsors whose 
support is acknowledged at the conference site. 

Finally, we thank Luis Paulo Reis for organizing the activities related to robosoc- 
cer, Rui Camacho, our Publicity Chair, Rodolfo Matos for technical support and 
local organization in collaboration with Rui Leite, Vera Costa, Filipa Ribeiro and 
Joel Silva. 



October 2001 



Pavel Brazdil and Alipio Jorge 




EPIA’Ol 



loth Portuguese Conference on Artificial Intelligence 
Porto, Portugal, December 2001 



Programme Co-Chairs 

Pavel Brazdil 
Alipio Jorge 

Programme Commitee 

Paulo Azevedo 
Pedro Barahona 
Luis Camarinha Matos 
Helder Coelho 
Ernesto Costa 
Joao Gama 
Claude Kirchner 
Carolina Monard 
Fernando Moura Pires 
Arlindo Oliveira 
Eugenio Oliveira 
Ramon Otero 
David Pearce 
Gabriel Pereira Lopes 
Ruy de Queiroz 
Jan Rauch 
Joao Sentieiro 
Carles Sierra 
Luis Torgo 



Univ. Porto 
Univ. Porto 



Univ. Minho 
Univ. Nova de Lisboa 
Univ. Nova de Lisboa 
Univ. Lisboa 
Univ. Coimbra 
Univ. Porto 

LORLA-LNRLA, France 

Univ. S. Paulo, Brazil 

Univ. Evora 

LST, Lisboa 

Univ. Porto 

Univ. Coruna, Spain 

European Commission (EU) 

Univ. Nova de Lisboa 

Univ. F. Pernambuco, Brazil 

Univ. of Economics, Czech Rep. 

LST. Lisboa 

Univ. Catalunia, Spain 

Univ. Porto 




VIII Organization 



List of Reviewers 

Salvador Abreu 
Alexandre Agustini 
Jose Alferes 
Luis Antunes 
Paulo Azevedo 
Antonio L. Bajuelos 
Pedro Barahona 
Orlando Belo 
Mario R. Benevides 
Carlos L. Bento 
Olivier Boissier 
Rachel Bourne 
Olivier Bournez 
Andrea Bracciali 
Pavel Brazdil 
M. Paula Brito 
Sabine Broda 
Antoni Brogi 
Francisco Bueno 
Michele Bugliesi 
Daniel Cabeza 
Rui Camacho 
Margarida Cardoso 
Gladys Castillo 
Luis F. Castro 
Helder Coelho 
Luis Correia 
Manuel E. Correia 
Joaquim P. Costa 
Vitor S. Costa 
Carlos V. Damasio 
Gautam Das 
Gael H. Dias 
Frank Dignum 
Virginea Dignum 
Jurgen Dix 
Agostino Dovier 



Jesus C. Fernandez 

Jose L. Ferreira 

Marcelo Finger 

Miguel Filgueiras 

Jose M. Fonseca 

Dalila Fontes 

Michael Fink 

Michael Fisher 

Ana Fred 

Joao Gama 

Pablo Gamallo 

Gabriela Guimaraes 

Nick Jennings 

Alipio Jorge 

Claude Kirchner 

Jorg Keller 

Norbert Kuhn 

King-Ip Lin 

Vitor Lobo 

Alneu A. Lopes 

Luis S. Lopes 

Gabriel P. Lopes 

Ines Lynce 

Luis M. Macedo 

Benedita Malheiro 

Margarida Mamede 

Nuno C. Marques 

Joao Marques-Silva 

Joao P. Martins 

Luis C. Matos 

Carolina Monard 

Nelma Moreira 

Fernando Moura-Pires 

Joao Moura Pires 

Gholamreza Nakhaeizadeh 

Susana Nascimento 

Jose Neves 



Vitor Nogueira 
Eugenio Oliveira 
Ana Paiva 
Joao P. Pedroso 
Francisco C. Pereira 
Luis M. Pereira 
Bernhard Pfahringer 
Alessandra di Pierro 
Jorge S. Pinto 
Foster Provost 
Paulo Quaresma 
Elisa Quintarelli 
Luis P. Reis 
Solange O. Rezende 
Ana P. Rocha 
Antonio J. Rodrigues 
Irene Rodrigues 
Sabina Rossi 
Diana Santos 
Sergey Semenov 
Carles Sierra 
Jaime Sichman 
Joao M. Silva 
Carlos Soares 
Terrance Swift 
Paulo Teles 
F. P. Terpstra 
Ana P. Tomas 
Luis Torgo 
Tarma Uustalu 
Vasco T. Vasconcelos 
Gerard Widmer 
Marijana Zekic-Susac 
Jan Zizka 




Table of Contents 



Invited Speakers 

Agent Programming in Ciao Prolog 1 

Francisco Bueno 

Multi-relational Data Mining: A perspective 3 

Peter A. Flach 

A Comparison of GLOWER and other Machine Learning Methods for 

Investment Decision Making 5 

Vasant Dhar 

Extraction of Knowledge from Databases 

edited by Fernando Moura-Pires, Gabriela Guimaraes, 

AUpio Jorge, Pavel Brazdil 

Parallel Implementation of Decision Tree Learning Algorithms 6 

Nuno Amado, Jodo Gama, Fernando Silva 

Reducing Rankings of Classifiers by Eliminating Redundant Classifiers ... 14 

Pavel Brazdil, Carlos Soares, Rui Pereira 

Non-parametric Nearest Neighbor with Local Adaptation 22 

Francisco J. Ferrer- Troy an o, Jesus S. Aguilar- Ruiz, Jose C. Riquelme 

Selection Restrictions Acquisition from Corpora 30 

Pablo Gamallo, Alexandre Agustini, Gabriel P. Lopes 

Classification Rule Learning with APRIORI-C 44 

Viktor Jovanoski, Nada Lavrac 

Proportional Membership in Fuzzy Clustering as a Model of Ideal Types . 52 

S. Nascimento, B. Mirkin, F. Moura-Pires 

Evolution of Cubic Spline Activation Functions for Artificial Neural 

Networks 63 

Helmut A. Mayer, Roland Schwaiger 

Multilingual Document Clustering, Topic Extraction and Data 

Transformations 74 

Joaquim Silva, Jodo Mexia, Carlos A. Coelho, Gabriel Lopes 




X 



Organization 



Sampling-Based Relative Landmarks: Systematically Test-Driving 

Algorithms Before Choosing 88 

Carlos Soares, Johann Petrak, Pavel Brazdil 

Recursive Adaptive ECOC Models 96 

Elizabeth Tapia, Jose Carlos Conzdlez, Javier Carcia-Villalba, 

Julio Villena 

A Study on End-Cut Preference in Least Squares Regression Trees 104 

Luis Torgo 



Artificial Intelligence Techniques for Financial Time 
Series Analysis 
edited by Luis Torgo 

The Use of Domain Knowledge in Feature Construction for Financial 



Time Series Prediction 116 

Pedro de Almeida, Luis Torgo 

Optimizing the Sharpe Ratio for a Rank Based Trading System 130 

Thomas Hellstrdm 



Multi-agent Systems: Theory and Applications 

edited by Eugenio Oliveira 

Choice: The Key for Autonomy 142 

Luis Antunes, Jodo Faria, Helder Coelho 

Dynamic Evaluation of Coordination Mechanisms for Autonomous Agents 155 
Rachel A. Bourne, Karen Shoop, Nicholas R. Jennings 



Evolving Multi-agent Viewpoints - an Architecture 169 

Pierangelo DellAcqua, Jodo Alexandre Leite, Luis Moniz Pereira 

Enabling Agents to Update Their Knowledge and to Prefer 183 

Pierangelo DellAcqua, Luis Moniz Pereira 

Modelling Agent Societies: Co-ordination Frameworks and Institutions . . . 191 
Virginia Dignum,, Frank Dignum 

Argumentation as Distributed Belief Revision: Conflict Resolution in 

Decentralised Co-operative Multi-agent Systems 205 

Benedita Malheiro, Eugenio Oliveira 



Scheduling, Re-scheduling and Communication in the Multi-agent 

Extended Enterprise Environment 

Joaquim Reis, Nuno Mamede 



219 




Organization XI 



Electronic Institutions as a Framework for Agents’ Negotiation and 

Mutual Commitment 232 

Ana Paula Rocha, Eugenio Oliveira 

An Imitation-Based Approach to Modeling Homogenous Agents Societies 246 
Goran Trajkovski 

Logics and Logic Programming for 
Artificial Intelligence 
edited by Salvador Abreu, Jose Alferes 

Situation Calculus as Hybrid Logic: First Steps 253 

Patrick Blackburn, Jaap Kamps, Maarten Marx 

A Modified Semantics for LUPS 261 

Jodo Alexandre Leite 

On the Use of Multi-dimensional Dynamic Logic Programming to 

Represent Societal Agents’ Viewpoints 276 

Jodo Alexandre Leite, Jose Julio Alferes, Luis Moniz Pereira 

A Procedural Semantics for Multi-adjoint Logic Programming 290 

Jesus Medina, Manuel Ojeda-Aciego, Peter Vojtds 

Representing and Reasoning on Three-Dimensional Qualitative 

Orientation Point Objects 298 

Julio Pacheco, Maria Teresa Escrig, Francisco Toledo 

Encodings for Equilibrium Logic and Logic Programs with Nested 

Expressions 306 

David Pearce, Hans Tompits, Stefan Woltran 

A Context-Free Grammar Representation for Normal Inhabitants of 

Types in 321 

Sabine Broda, Luis Damas 

Permissive Belief Revision 335 

Maria R. Cravo, Jodo P. Cachopo, Ana C. Cachopo, Jodo P. Martins 

Constraint Satisfaction 

edited by Pedro Barahona 

Global Hull Consistency with Local Search for Continuous Constraint 

Solving 349 

Jorge Cruz, Pedro Barahona 



Towards Provably Complete Stochastic Search Algorithms for Satisfiability 363 
Ines Lynce, Luis Babtista, Jodo Marques-Silva 




XII 



Table of Contents 



A Combined Constraint-Based Search Method for Single-Track Railway 

Scheduling Problem 371 

Elias Oliveira, Barbara M. Smith 

Planning 

A Temporal Planning System for Time-Optimal Planning 379 

Antonio Garrido, Eva Onaindia, Eederico Barber 

SimPlanner: An Execution-Monitoring System for Replanning in 

Dynamic Worlds 393 

Eva Onaindia, Oscar Sapena, Laura Sebastia, Eliseo Marzal 

Hybrid Hierarchical Knowledge Organization for Planning 401 

Bhanu Prasad 

STeLLa: An Optimal Sequential and Parallel Planner 417 

Laura Sebastia, Eva Onaindia, Eliseo Marzal 

Author Index 417 




Agent Programming in Ciao Prolog 



Francisco Bueno and the CLIP Group* 

Facultad de Informatica - UPM 
buenoSf i .upm. es 

The agent programming landscape has been revealed as a natural framework 
for developing “intelligence” in AI. This can be seen from the extensive use of 
the agent concept in presenting (and developing) AI systems, the proliferation 
of agent theories, and the evolution of concepts such as agent societies (social 
intelligence) and coordination. 

Although a definition of what is an agent might still be controversial, agents 
have particular characteristics that define them, and are commonly accepted. 
An agent has autonomy, reactivity (to the environment and to other agents), 
intelligence (i.e., reasoning abilities). It behaves as an individual, capable of 
communicating, and capable of modifying its knowledge and its reasoning. 

For programming purposes, and in particular for AI programming, one would 
need a programming language/system that allows to reflect the nature of agents 
in the code: to map code to some abstract entities (the “agents”), to declare 
well-defined interfaces between such entities, their individual execution, possibly 
concurrent, possibly distributed, and their synchronization, and, last but not 
least, to program intelligence. 

It is our thesis that for the last purpose above the best suited languages are 
logic programming languages. It is arguably more difficult (and unnatural) to 
incorporate reasoning capabilities into, for example, an object oriented language 
than to incorporate the other capabilities mentioned above into a logic language. 
Our aim is, thus, to do the latter: to offer a logic language that provides the 
features required to program (intelligent) agents comfortably. 

The purpose of this talk, then, is not to introduce sophisticated reasoning 
theories or coordination languages, but to go through the (low-level, if you want) 
features which, in our view, provide for agent programming into a (high-level) 
language, based on logic, which naturally offers the capability of programming 
reasoning. 

The language we present is Ciao, and its relevant features are outlined below. 
Most of them have been included as language-level extensions, thanks to the 
extensibility of Ciao. Hopefully, the Ciao approach will demonstrate how the 
required features can be embedded in a logic programming language in a natural 
way, both for the implementor and for the programmer. 

State and Its Encapsulation: From Modules to Objects The state that is most rel- 
evant for programming intelligence is the state of knowledge. Classically, a logic 
program models a knowledge state, and, also, logic languages provide the means 

* Daniel Cabeza, Manuel Carro, Jesus Correas, Jose M. Gomez, Manuel Hermenegildo, 
Pedro Lopez, German Puebla, and Claudio Vaucheret 
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to manipulate and change this state: the well-known assert/retract operations. 
These can be used for modeling knowledge evolution. On the other hand, state 
of data and its evolution are arguably best modeled with objects. Also, objects 
are a basic construct for capturing “individuality” in the sense of agents. 

What is needed is a neat differentiation between the code that represents the 
state of knowledge and the code that represents the evolution of knowledge. A 
first step towards this is providing state encapsulation. This can be achieved with 
a well-defined, well-behaved module system. This is one of the main principles 
that has informed the design of Ciao. Having a language with modules and 
encapsulated state, the step towards objects is an easy one: the only extra thing 
needed is instantiation. Once we add the ability to create instances of modules, 
we have classes. This has been the approach in O’Ciao, the Ciao sublanguage 
for object orientation. 

Concurrency and Distribution These two features are provided in Ciao at two 
levels. At the language level, there are constructs for concurrent execution and 
for distributed execution. At the level of “individual entities”, concurrency and 
distribution comes through via the concept of active modules/objects. 

Reactivity and Autonomy: Active Modules /Objects A module/object is active 
when it can run as a separate process. This concept provides for “autonomy” at 
the execution level. The active module service of Ciao allows starting and/or con- 
necting (remote) processes which “serve” a module/object. The code served can 
be conceptually part of the application program, or it can be viewed alternatively 
as a program component: a completely autonomous, independent functionality, 
which is given by its interface. 

Active modules/objects can then be used as “watchers” and “actors” of and 
on the outside world. The natural way to do this is from a well-defined, easy- 
to-use foreign language interface that allows to write drivers that interact with 
the environment, but which can be viewed as logic facts (or rules) from the 
programmer’s point of view. An example of this is the SQL interface of Ciao to 
sql-based external databases. 

And More Adding more capabilities to the language, in particular, adding more 
sophisticated reasoning schemes, requires that the language be easily extensible. 
Extensibility has been another of the guidelines informing the design of Ciao: 
the concept of a package is a good example of this. Packages are libraries that 
allow syntactic, and also semantic, extensions of the language. They have been 
already used in Ciao, among other things (including most of the abovementioned 
features), to provide higher order constructions like predicate abstractions, and 
also fuzzy reasoning abilities. 




Multi-relational Data Mining: A Perspective 
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Abstract. Multi-relational data mining (MRDM) is a form of data mining oper- 
ating on data stored in multiple database tables. While machine learning and data 
mining are traditionally concerned with learning from single tables, MRDM is re- 
quired in domains where the data are highly structured. One approach to MRDM 
is to use a predicate-logical language like clausal logic or Prolog to represent and 
reason about structured objects, an approach which came to be known as inductive 
logic programming (ILP) [18,19,15,16,13,17,2,5]. 

In this talk I will review recent developments that have led from ILP to the broader 
field of MRDM. Briefly, these developments include the following: 

- the use of other declarative languages, including functional and higher-order 
languages, to represent data and learned knowledge [9,6,1]; 

- abetter understanding of knowledge representation issues, and the importance 
of data modelling in MRDM tasks [7,1 1]; 

- a better understanding of the relation between MRDM and standard single- 
table learning, and how to upgrade single-table methods to MRDM or down- 
grade MRDM tasks to single-table ones (propositionalisation) [3,12,10,14]; 

- the study of non-classificatory learning tasks, such as subgroup discovery and 
multi-relational association rule mining [8,4,21]; 

- the incorporation of ROC analysis and cost-sensitive classiflcation [20]. 
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Abstract. Prediction in financial domains is notoriously difficult for a number 
of reasons. First, theories tend to be weak or non-existent, which makes problem 
formulation open ended by forcing us to consider a large number of independent 
variables and thereby increasing the dimensionality of the search space. Second, 
the weak relationships among variables tend to be nonlinear, and may hold only 
in limited areas of the search space. Third, in financial practice, where analysts 
conduct extensive manual analysis of historically well performing indicators, a 
key is to find the hidden interactions among variables that perform well in combi- 
nation. Unfortunately, these are exactly the patterns that the greedy search biases 
incorporated by many standard mle learning algorithms will miss. 

One of the basic choices faced by modelers is on the choice of search method 
to use. Some methods, notably, tree induction provide explicit models that are 
easy to understand. This is a big advantage of such methods over, say, neural 
nets or naive Bayes. My experience in financial domains is that decision makers 
are more likely to invest capital using models that are easy to understand. More 
specifically, decision makers want to understand when to pay attention to specific 
market indicators, and in particular, in what ranges and under what conditions 
these indicators produce good risk- adjusted returns. Indeed, many professional 
traders have remarked that they are occasionally inclined to make predictions about 
market volatility and direction, but cannot specify these conditions precisely or 
with any degree of confidence. For this reason, rules generated by pattern discov- 
ery algorithms are particularly appealing in this respect because they can make 
explicit to the decision maker the particular interactions among the various mar- 
ket indicators that produce desirable results. They can offer the decision maker a 
"loose theory" about the problem that is easy to critique. 

In this talk, I describe and evaluate several variations of a new genetic learning 
algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been 
motivated by financial prediction problems, but incorporates successful ideas from 
tree induction and rule learning. I examine the performance of several GLOWER 
variants on a standard financial prediction problem (S&P500 stock returns), us- 
ing the results to identify one of the better variants for further comparisons. I 
introduce a new (to KDD) financial prediction problem (predicting positive and 
negative earnings surprises), and experiment with GLOWER, contrasting it with 
tree- and rule-induction approaches as well as other approaches such as neural 
nets and naive Bayes. The results are encouraging, showing that GLOWER has 
the ability to uncover effective patterns for difficult problems that have weak struc- 
ture and significant nonlinearities. Einally, I shall discuss open issues such as the 
difficulties of dealing with non stationarity in financial markets. 
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Abstract. In the fields of data mining and machine learning the amount 
of data available for building classifiers is growing very fast. Therefore, 
there is a great need for algorithms that are capable of building classifiers 
from very-large datasets and, simultaneously, being computationally ef- 
ficient and scalable. One possible solution is to employ parallelism to 
reduce the amount of time spent in building classifiers from very-large 
datasets and keeping the classification accuracy. This work first overviews 
some strategies for implementing decision tree construction algorithms 
in parallel based on techniques such as task parallelism, data parallelism 
and hybrid parallelism. We then describe a new parallel implementation 
of the C4.5 decision tree construction algorithm. Even though the imple- 
mentation of the algorithm is still in final development phase, we present 
some experimental results that can be used to predict the expected be- 
havior of the algorithm. 



1 Introduction 

Classification has been identified as an important problem in the areas of data 
mining and machine learning. Over the years different models for classification 
have been proposed, such as neural networks, statistical models as linear and 
quadratic discriminants, decision trees, and genetic algorithms. Among these 
models, decision trees are particularly suited for data mining and machine learn- 
ing. Decision trees are relatively faster to build and obtain similar, sometimes 
higher, accuracy when compared with other classification methods [7,9]. Nowa- 
days there is an exponential growing on the data stored in computers. There- 
fore, it is important to have classification algorithms computationally efficient 
and scalable. Parallelism may be a solution to reduce the amount of time spent 
in building decision trees using larger datasets and keep the classification accu- 
racy [4,5,11,12]. Parallelism can be easily achieved by building the tree decision 
nodes in parallel or by distributing the training data. However, implementing 
parallel algorithms for building decision trees is a complex task due to the follow- 
ing reasons. First, the shape of the tree is highly irregular and it is determined 
only at runtime, beyond the fact that the amount of processing used for each 
node varies. Hence, any static allocation scheme will probably suffer from a high 
load imbalance problem. Second, even if the successors of a node are processed 
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in parallel, their construction needs part of the data associated to the parent. If 
the data is dynamically distributed by the processors witch are responsible for 
different nodes then it is necessary to implement data movements. If the data is 
not properly distributed then the performance of the algorithm can suffer due 
to a loss of locality [5,12]. Decision trees are usually built in two phases: tree 
construction and simplification phases. With no doubt, the most computational 
expensive phase of the algorithm is the tree construction phase[2,9,10,12j. There- 
fore, in this work, for the description of the parallel construction tree algorithms 
we only consider the first phase. 

In the reminder of the paper we first overview some strategies for implement- 
ing parallel decision tree algorithms, and then describe our parallel implemen- 
tation of the C4.5 decision tree construction algorithm. We then present some 
preliminary results and draw some conclusions. 



2 Related Work 

This section overviews some strategies for implementing decision tree construc- 
tion algorithms in parallel using task parallelism, data parallelism and hybrid 
parallelism. 

Task parallelism. The construction of decision trees in parallel by following 
a task-parallelism approach can be viewed as dynamically distributing the deci- 
sion nodes among the processors for further expansion. A single processor using 
all the training set starts the construction phase. When the number of decision 
nodes equals the number of processors the nodes are split among them. At this 
point each processor proceeds with the construction of the decision sub-trees 
rooted at the nodes of its assignment. This approach suffers, in general, from 
bad load balancing due to the possible different sizes of the trees constructed by 
each processor. Also, for the implementations presented in [4], they require the 
whole training set to be replicated in the memory of all the processors. Alter- 
natively, they require a great amount of communications for each processor to 
have access to the examples kept in the other’s memory. 

Data parallelism. The use of data parallelism in the design of parallel de- 
cision tree construction algorithms can be generally described as the execution 
of the same set of instructions (algorithm) by all processors involved. The par- 
allelism is achieved by distributing the training set among the processors where 
each processor is responsible for a distinct set of examples. The distribution of 
the data can be performed in two different ways, horizontally or vertically. 

The parallel strategy based on vertical data distribution [5] consists in split- 
ting the data by assigning a distinct set of attributes to each processor. Each 
processor keeps in its memory only the whole values for the set of attributes 
assigned to him and the values of the classes. During the evaluation of the pos- 
sible splits each processor is responsible only for the evaluation of its attributes. 
Parallelism with vertical data distribution can still suffer from load imbalance 
due to the evaluation of continuous attributes which requires more processing 
than the evaluation of discrete attributes. 
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The parallel strategy based on horizontal data distribution consists in dis- 
tributing the examples evenly by the processors. Each processor keeps in its 
memory only a distinct subset of examples of the training set. The possible 
splits of the examples associated to a node are evaluated by all the processors, 
which communicate between them to find the global values of the criteria used 
and, by this, the best split. Each processor performs locally the split. In [11], 
the authors describe the implementation of two algorithms with horizontal data 
distribution: SLIQ and SPRINT systems. For the evaluation of a split based on a 
continuous attribute, these systems have the set of examples sorted by the values 
of the attribute. In order to avoid sorting the examples every time a continuous 
attribute is evaluated, they use separate lists of values for each attribute, which 
are sorted once at the beginning of the tree construction. SLIQ also uses a special 
list, called the class list, which has the values of the class for each example and 
stays resident in memory during all the tree construction process. In the parallel 
implementation of SLIQ the training set is distributed horizontally among the 
processors where each processor is responsible for creating its own attribute lists. 
In SPRINT, the class list is eliminated by adding the class label in the attribute 
lists entries. The index in the attribute lists is now the index of the example 
in the training set. The evaluation of the possible splits is also performed by 
all processors, which communicate between them to find the best split. After 
finding the best split, each processor is responsible for splitting its own attribute 
lists. Both systems report good performance and scalability results, being also 
capable of processing very-large datasets. However, the operation to perform the 
split of the set of examples associated with a node requires high communication 
load in both systems. 

Hybrid parallelism. The parallel decision tree construction algorithms, 
which use hybrid parallelism, can be characterized as using both data parallelism 
with horizontal or vertical distribution and task parallelism. The implementation 
of hybrid parallel algorithms is strongly motivated by the choice between the 
distribution of the amount of processing at each node and the required volume 
of communications. For the nodes covering a significant amount of examples, is 
used data parallelism to avoid the problems already stated of load imbalance 
and of poor use of parallelism associated with task parallelism. But, for the 
nodes covering fewer examples the time used for communications can be higher 
than the time spent in processing the examples. To avoid this problem, when 
the number of examples associated to a node is lower than a specific value, 
one of the processors continues alone the construction of the tree rooted at the 
node (task parallelism). Two parallel decision tree construction algorithms using 
hybrid parallelism are described in [6,12]. 



3 C4.5 Parallel Implementation 

In this section, we describe a new parallel implementation of the C4.5 decision 
tree construction algorithm. This implementation follows an horizontal data par- 
allelism strategy similar to that used by the parallel implementation of SLIQ [11], 
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but modified for the C4.5 algorithm. Our parallel implementation was designed 
to be executed in a distributed memory environment where each of the k pro- 
cessors has its own private memory. It addresses two fundamental issues: load 
balance and locality. An uniform load balance is achieved by distributing horizon- 
tally the examples equally among the k processors and using a new breadth-first 
strategy for constructing the decision tree. As in the parallel version of SLIQ, 
each processor is responsible for building its own attribute lists and class lists 
from the subset of examples assigned to it. The entries in the class lists keep, 
for each example, the class label, the weight (used in the C4.5 algorithm for 
dealing with unknown values) , the corresponding global index of the example in 
the training set and a pointer to the node in the tree to which the example be- 
longs. The attribute lists also have an entry for each example with the attribute 
value and an index pointing to the example corresponding entry in the class list. 
The continuous attribute lists are globally sorted by the values of the attribute 
using the sorting algorithm described in [3]. Each processor has now, for each 
continuous attribute, a list of sorted values where the first processor has the 
lower values of the attribute, the second has the attribute values higher then the 
first processor and lower then the third, and so on. Because of the global sort 
of the continuous attributes, a processor can have attribute values that do not 
correspond to any of the examples initially assigned to it. After the sort, each 
processor updates its class list with the examples information corresponding to 
the new values of the continuous attributes. 

Several authors[2,10] pointed out one of the efficiency limitations of C4.5: 
the repetitive sorting of the examples covered by a node every time a continuous 
attribute is evaluated. This limitation is eliminated with the implementation of 
the attribute lists where the continuous attributes are sorted only once. The 
main problems in the parallel tree construction process reside in performing the 
split and in finding the best split for the set of examples covered by a node. 
For these two steps, communication among processors is necessary in order to 
determine the best global split and the assignments of examples to the new 
subsets resulting from the split. 

In the parallel version of the algorithm each processor has a distinct subset 
of values of the continuous attributes in their globally sorted lists. Before each 
processor starts evaluating the possible splits of the examples, the distributions 
must be initialized to reflect the examples assigned to the other processors. Each 
processor finds its distributions of the local set of examples for each attribute and 
sends them to the other processors. For the nominal attributes, these distribu- 
tions are gathered in all processors. All processors compute the gain for nominal 
attributes. For continuous attributes the gain of a possible split, is found based 
upon the distributions before and after the split point. When a processor receives 
the distributions from another it initializes, for each continuous attribute, the 
distributions before the split point with the distributions of the processors with 
lower rank. The distributions after the split point are initialized with those from 
the processors with higher rank. After evaluating all possible divisions of their 
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local set of examples, the processors communicate among themselves in order to 
find the best split from the k local best splits found by each one. 

As soon as the best split is found, the split is performed by creating the child 
nodes and dividing the set of examples covered by the node. This step requires 
for each processor to update the pointers to the nodes in their class lists. The 
attribute lists are divided in many lists, as a result of the split, and each of those 
sublists are assigned to every new child node. For the split each processor scans 
its local list of the attribute chosen and depending on the value of the attribute, 
each entry is moved to the correspondent sublist, and the pointer to the node in 
the class list is updated. 

For dividing the remaining attribute lists the pointers to the nodes in the class 
list are used. But, before that, the class lists in each process must be updated. 
For the entries in the class list of one processor, whose values for the chosen 
attribute are kept in another processor, must be updated with the information 
of the node to which the corresponding examples were assigned by the other 
processor. For each example in the set covered by the node, the index, weight 
and the node to which they were assigned are gathered in all processors allowing 
each processor to update its local class list and divide the attribute lists. When 
dividing the attribute lists, each processor finds its local class distributions for 
each attribute in the new sets of examples assigned to the child nodes. Still on 
this phase, the distributions are scattered allowing for each processor to know 
the global class distributions used later during the splits evaluation phase. 

Our parallel decision tree algorithm described in this work preserves most 
of the main features of C4.5. One of the most important is the ability to deal 
with unknown attribute values. Furthermore, since the same evaluation criteria 
and splits were considered in the parallel version, it obtains the same decision 
trees and classification results as those obtained by C4.5. Our parallel system 
modifies the C4.5 algorithm in order to use attribute lists and class lists. These 
are fundamental data structures to achieve parallelism as used in SLIQ strategy. 
The main reason for using separate lists for the attributes is to avoid sorting 
the examples every time a continuous attribute is evaluated. The several sorts 
performed by C4.5 during the evaluation of the attributes are one of its efficiency 
limitations [2,10], and, if they were kept in the parallel version then they would 
strongly limit the performance of the algorithm. Hence, by using separate lists 
for the attributes makes it possible to globally sort the continuous attributes 
only once at the beginning of the tree construction process. 

The SPRINT strategy was developed to overcome the use of centralized struc- 
tures such as the class list. It avoids the class lists by extending the attribute 
lists with two extra fields, the class label and the global index of the example in 
the training set. The main disadvantage stated in [11] for the use of the class list 
is that it can limit the size of the training set that can be used with the algo- 
rithm, due to the fact that it must stay resident in memory during the execution 
of the algorithm. Suppose now we have a training set with N examples and A 
attributes and the entry values of the attribute lists and class list require one 
word of memory. Suppose also that we have k processors available for the execu- 
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tion of the algorithms. In this situation, SPRINT will need 3 * A * N/k memory 
words in each processor to store the training set divided equally among the k 
processors. For the algorithm described in this work the worst situation happens 
when, due to the global sort of the continuous attribute lists, the class list must 
store the information for all examples in the training set. In this situation the 
amount of memory necessary to store the training set in each processor will be 
2 * A * N/k + 3 * memory words. It is easy to observe that when the num- 
ber of attributes is three times greater then the number of available processors 
(A > 3*fc) it is better to use, in terms of memory, the class list then the attribute 
lists. Another disadvantaged stated in [11] relatively to the use of the class list is 
the time required to update it during the split. In the implementation of SLIQ, 
the class list is entirely replicated in the memory of each processor. After the 
split each processor has to update the entire list, which limits the performance 
of the algorithm. 

Similarly to SLIQ and SPRINT, in the parallel version of C4.5 for the split 
the processors exchange the global indexes of the examples covered by a node 
with the information to which child node they were assigned. Each processor 
keeps a list of pointers to the entries of the class list sorted by the examples 
indexes in order to improve the searches when updating the class list with the 
information gathered from the other processors. 



4 Experimental Results 

Our parallel implementation is still being refined, therefore the results are only 
preliminary and we believe are subject to further improvement. However, they 
can be used to predict the expected behavior of the parallel algorithm. As men- 
tioned before, our decision tree algorithm uses the same criteria and procedures 
used in the C4.5 algorithm. Therefore, they produce the same decision trees with 
equal classification accuracy. Consequently, we evaluate the performance of our 
system only in terms of execution time required for the construction of the de- 
cision tree. Again, the prune phase of the algorithm was not taken into account 
for the time measured in the experiments as it is negligible. 

The parallel version of C4.5 was implemented using the standard MPI com- 
munication primitives [8] . The use of standard MPI primitives allows the imple- 
mentation of the algorithm to be highly portable to clusters of machines, that 
is to distributed memory environments. The experiments presented here were 
conducted on a PC server with four Pentium Pro CPUs and 256MB of memory 
running Linux 2.2.12 from the standard RedHat Linux release 6.0 Distribution. 
Each CPU in the PC server runs at 200MHz and contains 256Kb of cache mem- 
ory. All the time results presented here are the average result of three executions 
of the algorithm. 

All experiments used the Synthetic data set from Agrawal et. al. [1]. This 
dataset has been previously used for testing parallel decision tree construction 
algorithms such as the systems SLIQ and SPRINT [11,12]. Each example in the 
dataset has 9 attributes, where 5 of them are continuous and 3 are discrete. The 
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Fig. 1. Speedup results. 




Fig. 2. Scale-up results. 



values of the attributes are randomly generated. Agrawal et. al. also describe 10 
classification functions based on the values of these attributes and with different 
complexities. For the datasets used in the experiments it was only considered 
one of those functions, named Function2. 

The first set of experiments measured the performance, in terms of speedup, 
of the algorithm. For each experiment we kept constant the total size of the 
training set and varying the number of processors from 1 to 4. Figure 1 illus- 
trates the speedup results for training sets of 100k, 200k and 400k examples. 
Our algorithm overall shows good speedup results and, as expected, the best 
results were obtained for the largest dataset used (with 400K examples). The 
slowdown observed for the largest processor configuration is mainly due to the 
high communication costs in the split phase. The next set of experiments aimed 
to test the scale-up characteristics of the algorithm. In these experiments, for 
each scale-up measure, the size of the training set (100k examples) in each pro- 
cessor was kept constant while the processors configuration varied from 1 to 4. 
Figure 2 shows that our algorithm achieves good scale-up behavior. Again, high 
communication overheads in the splitting phase influence scale-up capacity as 
the processors increase. 
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5 Conclusion 

In this work we overviewed some parallel algorithms for constructing decision 
trees and proposed a new parallel implementation of the C4.5 decision tree con- 
struction algorithm. Our parallel algorithm uses a strategy similar to that of 
SLIQ but adapted for the C4.5 algorithm. Furthermore, it preserves the main 
features of C4.5, such as the ability to deal with unknown attribute values. As 
a result, our system builds the same decision trees, as those obtained by C4.5, 
with the same classification accuracy. 

Based in the preliminary results presented, the algorithm shows good speedup 
and scale-up performance indicating that it can be used to build decision trees 
from very-large datasets. However some work has yet to be done. The results 
obtained when measuring the speedup characteristics of the algorithm indicate 
that the communications overhead is a key factor in limiting the performance of 
the algorithm. 
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Abstract. Several methods have been proposed to generate rankings of 
supervised classification algorithms based on their previous performance 
on other datasets [8,4]. Like any other prediction method, ranking meth- 
ods will sometimes err, for instance, they may not rank the best algo- 
rithm in the first position. Often the user is willing to try more than 
one algorithm to increase the possibility of identifying the best one. The 
information provided in the ranking methods mentioned is not quite ad- 
equate for this purpose. That is, they do not identify those algorithms in 
the ranking that have reasonable possibility of performing best. In this 
paper, we describe a method for that purpose. We compare our method 
to the strategy of executing all algorithms and to a very simple reduction 
method, consisting of running the top three algorithms. In all this work 
we take time as well as accuracy into account. As expected, our method 
performs better than the simple reduction method and shows a more 
stable behavior than running all algorithms. 

Keywords: meta-learning, ranking, algorithm selection 



1 Introduction 

There is a growing interest in providing methods that would assist the user 
in selecting appropriate classification (or other data analysis) algorithms. Some 
meta-learning approaches attempt to suggest one algorithm while others pro- 
vide a ranking of the candidate algorithms [8,4]. The ranking is normally used 
to express certain expectations concerning performance (e.g. accuracy). The al- 
gorithm that appears in a certain position in the ranking is expected to be in 
some way better (or at least not worse) than the ones that appear afterwards. 
So, if the user is looking for the best performing algorithm, he can just follow the 
order defined by the ranking and try the corresponding algorithms out. In many 
cases the user may simply use the algorithm at the top of the ranking. However, 
the best performing algorithm may appear further on in the ranking. So, if the 
user is interested to invest more time in this, he can run few more algorithms 
and identify the algorithm that is best. Therefore our aim is (a) to come up 
with a ranking that includes only some of the given candidate algorithms, (b) 
not to decrease the chance that the best performing algorithm is excluded from 
the ranking. In this paper we describe an extension to the basic ranking method 
that has been designed for this aim. 



P. Brazdil and A. Jorge (Eds.): EPIA 2001, LNAI 2258, pp. 14-21, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




Reducing Rankings of Classifiers by Eliminating Redundant Classifiers 



15 



In the next section we briefly review the basic ranking method. Section 3 
describes the method used to exclude items from a given ranking. Section 4 
presents the evaluation method adopted, while the following section describes 
the experimental setting and results. In Section 6 some alternative evaluation 
methods are briefly outlined and the last section presents the conclusions. 

2 Constructing Rankings of Classifiers on Past 
Experience 

Earlier we have proposed a method [8] that divides the problem of generating a 
ranking into two distinct phases. In the first one we employ a k-Nearest Neighbor 
method to identify datasets similar to the one at hand. This is done on the basis 
of different data characterization measures. In the second phase we consider 
the performance of the given classification algorithms on the datasets identified. 
This method is based on the Adjusted Ratio of Ratios (ARR), a multicriteria 
evaluation measure combining accuracy and time to assess relative performance. 
In ARR the compromise between the two criteria involved is given by the user 
in the form “the amount of accuracy I’m willing to trade for a 10 times speed-up 
is Accl?%” . The ARR of algorithm ap when compared to algorithm aq on data 
set di is defined as follows: 



ARR 



di 

ap,aq 



Adi 

■^ap 




1 + log * AccD 



where A* and T* are the accuracy and time of ap on di, respectively. Assuming 
without loss of generality that algorithm ap is more accurate but slower than 
algorithm aq on data set di, the value of ARR'^^^^^ will be 1 if ap is AccD% 
more accurate than aq for each order of magnitude that it is slower than aq. 
ARR'^p^^^ will be larger than one if the advantage in accuracy of ap is AccD% 
for each additional order of magnitude in execution time and vice-versa. 

The individual ARR measures are used as a basis for generating a rank- 
ing. Basically, the method aggregates the information for different datasets by 
calculating overall means of individual algorithms: 



ARRap = 



n{m — 1) 



( 1 ) 



where ARRap represents the mean of individual ARR measures for algorithm 
ap, n represents the number of datasets and m the number of algorithms. The 
values of ARRap are then examined and the algorithms ordered according to 
these values. 

As it has been shown this method provides quite good rankings overall and 
is quite competitive when compared to other approaches, such as DEA [4]. One 
disadvantage of this method is that it does not provide any help as to how many 
algorithms the user should actually try out. This problem is addressed in the 
next section. 




16 



Pavel Brazdil, Carlos Soares, and Rui Pereira 



3 Reducing Rankings of Classifiers by Eliminating 
Redundant Classifiers 

There are really two reasons why we would want to eliminate algorithms from 
a given ranking. The first one is that the ranking recommended for a given 
dataset may include one or more algorithms that have proven to be significantly 
worse than others in similar datasets in the past and, thus, have virtually no 
chance of achieving better results than its competitors on the current dataset. 
These algorithms can thus be dropped. The second one is that the given ranking 
may include one or more clones of a given algorithm, that are virtually indis- 
tinguishable from it, or other algorithms with very similar performance. These 
algorithms can again be dropped. 

We may ask why we would want to include clones in the candidate set in the 
first place. The answer is that recognizing clones is not an easy matter. Different 
people may develop different algorithms and give them different names. These, in 
some cases, may turn out to be quite similar in the end. There is another reason 
for that. We may on purpose decide to include the same algorithm several times 
with different parameter settings. We must thus have a way to distinguish useful 
variants from rather useless clones. Our objective here is to describe a method 
that can be used to do that. 

Suppose a given ranking R consists of n classifiers, C'i...C'„. Our aim is to 
generate a reduced ranking R, which contains some of the items in R. The 
method considers all algorithms in the ranking, following the order in which 
they appear. Suppose, we have come to algorithm C,. Then the sequence of 
subsequent algorithms Cj (where j > i) is then examined one by one. The 
aim is to determine whether each algorithm Cj should be left in the ranking, or 
whether it should be dropped. This decision is done on the basis of past results on 
datasets that are similar to the one at hand, i.e. datasets where the performance 
of the algorithms is similar to the performance that is expected on the new 
dataset. If the algorithm Cj has achieved significantly better performance than 
Ci on some datasets (shortly Cj ^ Ci) the algorithm is maintained. If it did not, 
then two possibilities arise. The first one is that Cj is significantly worse than 
Ci {Cj <C. Ci) on some datasets, and comparable to Ci on others. In this case Cj 
can be dropped. The second possibility is that neither algorithm has significant 
advantage over the other {Cj « Ci), indicating that they are probably clones. 
In this case too, Cj can be dropped. This process is then repeated until all 
combinations have been explored. 

Here performance is judged according to the ARR measure presented earlier, 
which combines both accuracy and time. The historical data we use contains 
information on different folds of cross validation procedure. It is thus relatively 
easy to conduct a statistical test determining whether ARR'^^ q^ is consistently 
greater than 1 on different folds, indicating that Cj is significantly better than 
Ci on dataset di. 

After the reduction process has terminated, the resulting ranking will contain 
the first item that was in R, plus a few more items from this ranking. Note 
that this need not be a sequence of several consecutive items. Should the best 
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performing algorithm have a clone under a different name, this clone will be left 
out. 

Reducing a subsequence of algorithms is beneficial in general, as it reduces 
the time the user needs to spend experimenting. However, this process involves 
certain risks. If the sequence is reduced too much, the optimal classifier may 
be missed out. If on the other hand the sequence is longer than required, the 
total time required to determine which algorithms is best increases. It is therefore 
necessary to have a way of evaluating different alternatives and comparing them. 
This topic is described in the next section. 

4 Comparing Different Rankings of Uneven Length 

Let us now consider the next objective - how to compare two rankings and 
determine which one is better. In our previous work [1,8,9] we have used a method 
that involves calculating a rank correlation coefficient to a ranking, constructed 
simply by running the algorithms on the dataset at hand. 

This method has one disadvantage - it does not enable us to compare rankings 
of different sizes. We have therefore decided to use another method here. We 
evaluate a reduced ranking as if it was a single algorithm with the accuracy of 
the most accurate algorithm included in the ranking and with total time equal 
to the sum of the times of all algorithms included. Given that ARR was the 
performance measure used to construct and reduce the rankings, it makes sense 
to use the same scheme in the evaluation. 

5 Experimental Evaluation 

The meta-data used was obtained from the METAL project (http : //www.metal- 
kdd.org). It contains results of 10 algorithms on 53 datasets. We have used three 
decision tree classifiers, C5.0 tree, C5.0 with boosting [7] and Ltree, which is a 
decision tree which can introduce oblique decision surfaces [3] . We have also used 
a linear discriminant (LD) [6], two neural networks from the SPSS Clementine 
package (Multilayer Perceptron and Radial Basis Function Network) and two 
rule-based systems, C5.0 rules and RIPPER [2]. Finally, we used the instance- 
based learner (IBL) and naive bayes (NB) implementations from the MLC-I— I- 
library [5]. 

As for the datasets, we have used all datasets from the UCI repository with 
more than 1000 cases, plus the Sisyphus data and a few applications provided 
by DaimlerChrysler. The algorithms were executed with default parameters and 
the error rate and time were estimated using 10-fold cross-validation. Not all of 
the algorithms were executed on the same machine and so we have employed a 
time normalization factor to minimize the differences. 

The rankings used to test the reduction algorithm were generated with the 
ranking method reviewed briefly in Section 2. Three settings for the compro- 
mise between accuracy and time were tested, with AccD e {0.1%, 1%, 10%}, 
corresponding to growing importance of time. 
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Table 1. ARR scores for different values of AccD. 

AccD 

methods 10 100 1000 

reduced 1.1225 1.0048 1.0037 

top 3 1.1207 0.9999 0.9889 

all 0.8606 0.9975 1.0093 



The parameters used in the ranking reduction method were the following: 

— Performance information for the ten nearest neighbors (datasets) was used to 
determine the redundancy of an algorithm, corresponding to approximately 
20% of the total number of datasets. Although the method seems to be rel- 
atively insensitive to the number of neighbors, this percentage has obtained 
good results in previous experiments [8]. 

— An algorithm is dropped if it is not significantly better than another algo- 
rithm retained in the ranking on at least 10% of datasets selected (i.e. one 
dataset, since ten neighbors are used). 

— The significance of differences between two algorithms has been checked with 
the paired t test with a confidence level of 95%. 

We have compared our method with the full ranking and with a reduced 
ranking obtained by selecting the top three algorithms in the full ranking. 




MmO 






Fig. 1. Average ARR scores over 53 dataset for different values of AccD. 



As can be seen in Table 1 and Figure 1 the method proposed always performs 
better than the simple reduction method that uses the top three algorithms in 
the full ranking. 

As for the choice of executing all the algorithms, when time is given great 
importance, this option is severely penalized in terms of the ARR score. When 
accuracy is the dominant criterion, executing all the algorithms represents the 
option. This result is of course to be expected, since this strategy consists of 
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Table 2. Complete and reduced rankings for the segmentdataset for two different 
values of AccD. The algorithms eliminated in the reduced ranking are indicated by 



AccD 



rank 10% 

1 C5.0 tree 

2 LD 

3 C5.0 rules * 

4 C5.0 w/ boosting * 

5 Ltree * 

6 IBL * 

7 NB * 

8 RIPPER * 

9 MLP * 

10 RBEN * 



1 % 

C5.0 w/ boosting 

C5.0 rules 

C5.0 tree 

Ltree 

IBL 

LD 

NB 

RIPPER 

MLP 

RBEN 



executing all algorithms with the aim of identifying the best option. So, our 
method is suitable if time matters at least a bit. 

One interesting observation is that the method presented always obtains an 
average ARR larger than 1. This means that, for all compromises between accu- 
racy and time, as defined by the AccD parameter, it generates recommendations 
that are advantageous either in terms of accuracy, or in terms of time, when com- 
pared to the average of the other two methods. 

We wanted to analyze some of the reduced ranking to see how many algo- 
rithms were actually dropped and which ones. Table 2 presents the recommen- 
dation that was generated for the segment dataset. 

When time is considered very important {AccD = 10%, equivalent to a com- 
promise between lOx speed up and a 10% drop in accuracy), only two algorithms 
were retained in the ranking: C5.0 tree and LD. All the others marked with * 
have been dropped. This result corresponds quite well to our intuitions. Both 
CS.Otree and linear discriminant are known to be relatively fast algorithms. It is 
conceivable that this does not vary from dataset to dataset^ and hence variations 
in accuracy on different datasets did not affect things much. 

The situation is quite different if we give more importance to accuracy. Let 
us consider, for instance the case when AccD = 1%. Here 7 out of 10 algorithms 
were retained, including MLP which appeared in the 9th position in the original 
ranking. This is justified by the historical evidence that the method takes into 
account when giving this recommendation. Each of the algorithms retained must 
have done well at least on one dataset. It appears that mlcnb, ripper and RBEN 
do not satisfy this condition and hence were left out^. 

^ Although this may not be true for datasets with an extremely large number of 
attributes, we have used none of those. 

^ All results were obtained with default parameter settings. However, the performance 
of some algorithms is highly dependent on the parameter setting used. If parameter 
tuning was performed, some of the algorithms that were left out could be competitive 
in relation to others. This issue was ignored in this study. 
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6 Discussion 

The evaluation measure presented can be improved. We can try to determine 
whether the ARR measure associated with one reduced ranking is higher than 
(or equal to, or lower than) the ARR measure associated with another plan. In 
addition, it is possible to carry out statistical tests with the aim of determining 
whether these differences are significant. This analysis can be carried out for 
different setting of the constant AccD determining the relative importance of 
time. This procedure can thus be used to identify the algorithms lying on, or 
near the, so called, efficiency frontier [4] . 

Furthermore, note that this method can be used to compare reduced rankings 
of different lengths. However, if we wanted to evaluate not only the final outcome, 
but also some intermediate results, this could be done as follows. In this setting, 
a reduced ranking can be regarded as a plan to be executed sequentially. Suppose 
a given plan Pj consists of ni items (classifiers) and plan Pj of U 2 items, where, 
say, n2 < nl. The first U 2 comparisons can be carried out as described earlier, 
using the ARR measure. After that we simply complete the shorter plan with 
ni — U 2 dummy entries. Each entry just replicates the accuracy characterizing 
the plan. It is assumed that no time is consumed in this process. 

7 Conclusions 

We have presented a method to eliminate the algorithms in a ranking that are ex- 
pected to be redundant. The method exploits past performance information for 
those algorithms. The method presented addresses one shortcoming of existing 
ranking methods [8,4], that don’t indicate, from the ranking of candidate algo- 
rithms, which ones are really worth trying. As expected, the experiments carried 
out show that our method performs better than the simple strategy of select- 
ing the top three algorithms. Also, when time is very important, our method 
represents a significant improvement compared to running all algorithms. It is 
interesting to note that when accuracy is very important, although running all 
algorithms is the best strategy, the difference to our method is small. 

Here we have concentrated on accuracy and time. It would be interesting to 
take other evaluation criteria into account, e.g., interpretability of the models. 
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Abstract. The k-Nearest Neighbor algorithm (k-NN) uses a classifica- 
tion criterion that depends on the parameter k. Usually, the value of this 
parameter must be determined by the user. In this paper we present an 
algorithm based on the NN technique that does not take the value of 
k from the user. Our approach evaluates values of k that classified the 
training examples correctly and takes which classified most examples. 
As the user does not take part in the election of the parameter k, the 
algorithm is non-parametric. With this heuristic, we propose an easy 
variation of the k-NN algorithm that gives robustness with noise present 
in data. Summarized in the last section, the experiments show that the 
error rate decreases in comparison with the k-NN technique when the 
best k for each database has been previously obtained. 



1 Introduction 

In Supervised Learning, systems based on examples (CBR, Case Based Reason- 
ing) are object of study and improvement from their appearance at the end of 
the sixties. These algorithms extract knowledge by means of inductive processes 
from the partial descriptions given by the initial set of examples or instances. 
Machine learning process is usually accomplished in two functionally different 
phases. In the first phase of Training a model of the hyperspace is created by the 
labelled examples. In the second phase of Classification the new examples are 
classified or labelled based on the constructed model. The classifier approached 
in this paper belongs to the family of the nearest neighbor algorithm (from here 
on NN) where the training examples are the model itself. NN assigns to each 
new query the label of its nearest neighbor among those that are remembered 
from the phase of Training (from here on the set T). 

In order to improve the accuracy with noise present in data, the k-NN algo- 
rithm introduces a parameter k so that for each new example q to be classified 
the classes of the k nearest neighbors of q are considered: q will be labelled with 
the majority class or, in case of tie, it is randomly broken. Another alternative 
consists in assigning that class whose average distance is the smallest one or 
introducing a heuristically obtained threshold ki < k so that the assigned class 
will be that with a number of associated examples greater than this thresh- 
old [12]. Extending the classification criterion, the k-NN^y algorithms (Nearest 
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Fig. 1. The chosen value for k is decisive to classify a new example q by k-NN when 
this example is near the decision boundaries. For this example, the smaller value of the 
parameter k that classifies q correctly is 5. 



Neighbor Weighted Voted) assign weights to the prediction made by each exam- 
ple. These weights can be inversely proportional to the distance with respect to 
the example to be classified [5,7]. Therefore, the number k of examples observed 
and the metric used to classify a test example are decisive parameters. Usually 
k is heuristically determined by the user or by means of cross-validation [11]. 
The usual metrics of these algorithms are the Euclidean distance for continuous 
attributes and the Overlap distance for nominal attributes (both metrics were 
used in our experiments). 

In the last years have appeared interesting approaches that test new met- 
rics [15] or new data representations [3] to improve accuracy and computational 
complexity. Nevertheless, in spite of having a wide and diverse field of applica- 
tion, to determine with certainty when k-NN obtains higher accuracy than NN 
[2] and viceversa [8] is still an open problem. In [6] it was proven that when the 
distance among examples with the same class is smaller than the distance among 
examples of different class, the probability of error for NN and k-NN tends to 
0 and , respectively. But, not always this distribution for input data appears, 
reason why k-NN and k-NNu}^ can improve the results given by NN with noise 
present in the data. 

In [13] the experimental results give rise to the two following hypotheses: a) 
Noisy data need large values for fc; b) The performance of k-NN is less sensitive 
to the choice of a metric. Figure 1 illustrates this fact when the values of two 
attributes from the Iris database are projected on the plane. The X-axis measures 
the length of the petal and the Y-axis measures the width of the petal. 

In [14] a study of the different situations in which k-NN improves the results 
of NN is exposed, and four classifiers are proposed (Locally Adaptive Nearest 
Neighbor, localKN N]^g) where for each new example q to be classified the pa- 
rameter k takes a value kq which is similar to the values that classified the M 
nearest neighbors Cq of q. Using a similar approach to Wettschereck’s, we pro- 
pose a classification criterion for new examples by taking different values for k 
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according to the most frequent ones that classified the original examples cor- 
rectly. To calculate such values the proximity of each example to its enemy is 
analysed, being the enemy the nearest neighbor with different class. A priori, 
if the example to be classified is near examples having different classes among 
them, it might be classified by few values. But if this example is surrounded by 
neighbors with the same class, it could be classified by different values. 

This paper presents a method that reduces and classifies according to such 
a criterion with no need to analyse each particular case. In this way, the impure 
regions and the border instances are better analysed in order to provide greater 
robustness with noise present in the data. 

In section 2 the proposed method and their computational complexity are 
detailed. Section 3 describes the experiments and the results from the UCI repos- 
itory [4] and section 4 summarizes the conclusions. 

2 Description of the Algorithm 

2.1 Approach 

By means of the k-NN algorithm, if a new example is near the decision bound- 
aries, the resulting class depends on the parameter k. At worst, the percentages 
of examples of each class are similar at these regions. In such situation, the set 
formed by classifying values fcgj associated with each example at this region 
can be large or zero, i.e. some examples will not have any associated value fcgi 
which classify it correctly by k-NN. So, this information (the classifying values 
associated with the nearest neighbors of a new query q) can be not relevant to 
classify a new query q by k-NN. 

We not assume that it is possible to determine the values of the parameter k 
which allow to classify the examples in overlapped regions. However, we assume 
that it is possible to improve the accuracy if several times the k-NN algorithm 
is evaluated on these regions. The idea is as simple as to give more than one 
opportunity to the example that is to be classified . If the example to be classified 
is a central example this criterion will not have any effect. If it is a border 
example, the accuracy can improve. Thus, the disturbing effect caused by the 
noise and the proximity to enemies can be smoothed. The consequences of such 
a bias can be explained in three cases: 

— If <7 is a central example, the majority class might almost always be the same 
one for each evaluation. 

— If q is a noise example, either there will not be an associated value kq that 
classifies q correctly or kq will be large. 

— If g is a border example, several evaluations can avoid the errors of classifi- 
cation. 

Figures 2 and 3 illustrate these facts by means of projections on the plane of 
the values of two attributes of the Horse-Colic database. In the first case (Figure 
2) the value of k is slight relevant whereas in the second case (Figure 3) such 
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Fig. 2. Horse Colic database. If the new 
example to be classified is central, the 
majority class in each evaluation might 
be the same almost always. 



Fig. 3. Horse Colic database. If the new 
example to be classified is a border ex- 
ample, several evaluations of the k-NN 
algorithm can avoid some errors. 



value is critical. Therefore, our problem is to find either the limits 
between which the k-NN algorithm will be applied for each new example q or 
the values, which are not necessarily continuous, {kq-^, kq^, . . .} from which is 
calculated the majority class. The method has been denominated fNN (k- 
Frequent Nearest Neighbors) since it takes the most frequent values of k among 
those that classified correctly the examples of each database. In this process, 
there are no parameters given by the user since these values are calculated locally 
for each database. 



2.2 The Algorithm 

Let n be the number of examples of the Training set T. Let kNN{e, i) the 
nearest neighbor of an example e within T. We denote majorityClass{e,i..j) as 
the majority class between the and the neighbors of e. fNN associates 
with each example two values: 

1. kCMiui'. The smallest k that classifies correctly the example Ci by using the 
k-NN algorithm, such that (see Figure 4): 

Vj G [IjkCMiui) I Class {kN N (ci, j)) = Class (ci) 

=> majorityClass (e^, !.._)) yf Class (ci) (1) 

2. kCMaXi'. If kCMiui was found, then kCMaXi > kCMiui and: 

Vj G [kCMini, kCMaXi] => Class (ci) = Class (kNN (ci, j)) (2) 

With these values, a new value kLim is calculated which satisfies the following 
property: 



Vci G T, j G [mm{kCMini), kLim] => 3cfc G T \ kCMiuk = j (3) 
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Fig. 4. Example of kCMin and kCMax for an example e with class A. The ties are 
broken with the nearest class. Majority class is not A from k — 1 to k = 8. When k — 9 
the majority class is A (kCMirie = 9). All the examples have class A from fc = 9 to 
k = 12. Finally, the thirteenth neighbor of e has class C, so that kCMaXe = 12. 



where min{kC Mini) is the least kCMiui value of each example. 

Those examples that either have not any associated value kCMiui or 
have an associated value kCMiui > kLim are considered outliers and they are 
removed. The resulting reduced set (from here on Tf) will be used as Training 
model for our algorithm. 

In order to classify a new example q, the k-NN algorithm is applied several 
times to the same example q by varying the value of k. These values belong the 
interval [min{kCMini), kLim], The assigned label will be that among the nearest 
to q that is most frequent in the kr evaluations of k-NN. Thus, the computational 
complexity of fNN is: O [ri^ ■ (log n + 1) + n • (log n + <5 + 2) + kLim'^) . 

In the first phase the set Tf is generated. The n — 1 nearest neighbors for 
the n examples of the Training set are ordered (6* (n • (n + n • logn))). So, for 
each example it is computed the distance to all the neighbors (0 (n)) and then 
the associated list is ordered (0(nlogn)). After this procedure kLim is found 
(0 {n ■ S) , min{kCMini) < 6 < kLim) and the outliers are removed (0 (n)). 

In the second phase a test example is classified (0 (n • (logn + 1) + kLim?)). 
The pseudo code for the fNN algorithm is shown in Figure 5 where n is the 
original number of examples and c the number of different labels for the class. 



3 Results 

To carry out the method and the test, the Euclidean distance for continuous 
attributes and the Overlap distance for the nominal attributes were used. The 
values of the continuous attributes were normalized in the interval [0,1]. Ex- 
amples with missing-class was removed and attributes with missing-values was 
treated with the mean or mode, respectively. fNN was tested on 20 databases 
from the Machine Learning Database Repository at the University of California, 
Irvine [4] . In order to reduce statistical variation, each experiment was executed 
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function fNN (Train: SET; query : INSTANCE) return (label : Integer) 
var 

T_f : Set; k, k_Lim, label : Integer; k_Vect: VECTOR [n] [2] Of Integer 
frequencies: VECTOR [c] Of Integer 
beginF 

// Calculating kCMin, kCMax for each example. 

Calculate_MinMax_Values_K (Train, k_Vect) 

// Finding the limits for the evaluations of kNN. 

Calculate_Limits (k_Vect,k_Lim) 

// Removing the examples whose kCMin does not belong to [l,k_Lim] . 

T_f : = Delete_Outliers (Train, k_Lim) 
for k:= Min (kCMin) to k_Lim 
label : = Class if y_KNN (query, T_f , k) 

frequencies [label] : = frequencies [label] + l/Average_Dist (query, label , k) 
return ( frequencies . IndexOfMaxElement ( ) ) 
endF 



Fig. 5. Pseudo code for fNN. 



by means of 10-folds cross-validation. fNN was compared with k-NN using 25 dif- 
ferent values of k (the odd numbers belonging to interval [1, 51]). This limit was 
fixed after observing for all databases how the accuracy decreased from a value 
near the best k for each database (being 33 the maximum value for Heart Cleve- 
land) database. In Table 1 is reported the main results obtained. The average- 
accuracy with the associated standard deviation and the computational cost by 
means of fNN is showed in Columns 2a and 2b respectively. The k-NN algorithm 
is included for comparison using the best k for each database (Column 3a) and 
the best average- value (k=l) for all databases (Column la). Both computational 
cost for k-NN were very similar and they are showed in Column lb. Column 3b 
shows the best value of k for each database by k-NN. Column 2c show the size 
of Tf regarding the Training set and Column 2d show the values of kLim for 
each database, i.e. the limit for k by fNN. The databases marked with * mean an 
improvement of fNN regarding 1-NN by means of t-Student statical test using 
a = 0.05. We can observe in Table 1 that fNN obtained better precision than 
1-NN for 13 databases where the best k for the k-NN algorithm was a high value, 
so that: 

— If KLim < kbest for k-NN, then fNN provides higher accuracy than 1-NN. 

— The percentage of examples that are excluded from Ty is a minimum error 
bound for k-NN. 



4 Conclusions 

An easy variation of the k-NN algorithm has been explained and evaluated in 
this paper. Experiments with commonly used databases indicate that exits do- 
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Table 1. Results for 20 databases from the UCI repository by using 10- folds cross- 
validation. Column 1 shows the average accuracy with the standard deviation and the 
computational cost by k-NN with k=l (the best value for all databases). Column 2 
shows the same percentages obtained by /AW, the percentage of examples retained 
from the Training set and the value of kLim, i.e. the limit for k by fNN . Column 3 
shows the best accuracy with the standard deviation by the k-NN algorithm when the 
best k is found. This best k was looked for in the odd numbers belonging to interval 
[1,51]. 
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Pred. Acc. best k 


Anneal 


91.76 


±2.4 


1.10 


90.32 


±1.8 


15.7 


96.5 


9 


91.76 


±2.4 


1 


Balance Scale* 


77.76 


±4.8 


0.25 


89.44 


±1.5 


4.0 


89.7 


11 


89.76 


±1.4 


21 


B. Cancer (W) 


95.56 


±2.2 


0.20 


96.57 


±1.7 


2.91 


92.1 


9 


96.85 


±2.0 


17 


Credit Rating* 
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±2.2 


0.56 


87.11 


±1.8 


7.59 


91.9 


7 


87.68 


±1.6 


13 


German Credit 


72.29 


±2.9 


1.42 


74.61 


±3.4 


18.2 


89.1 


21 


73.09 


±4.2 


17 


Glass 


70.19 


±2.0 


0.04 


69.16 


±1.3 


0.64 


84.4 


9 


70.19 


±2.0 


1 


Heart D. (C)* 


74.92 


±2.5 


0.09 


81.52 


±2.0 


1.32 


89.3 


13 


83.17 


±2.7 


33 


Hepatitis* 


81.29 


±0.8 


0.03 


87.11 


±0.9 


0.43 


88.5 


9 


85.16 


±1.0 


7 


Horse Colic 


67.93 


±3.0 


0.22 


69.02 


±1.6 


2.90 


83.7 


11 


70.38 


±2.6 


7 
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86.61 


±1.9 


0.29 


85.18 


±2.0 


3.66 


91.1 


13 


86.61 


±1.9 


1 


Iris 


95.33 


±0.8 


0.01 


96.00 


±0.8 


0.29 


95.6 


1 


97.33 


±0.8 


15 


Pima Diabetes 


71.21 


±5.0 


0.56 


74.09 


±3.9 


7.62 


87.7 


15 


75.52 


±4.2 


17 


Primary Tumor* 


35.69 


±1.3 


0.05 


42.19 


±1.5 


0.81 


54.9 


11 


43.07 


±1.5 


29 


Sonar 


86.54 


±1.5 


0.16 


86.06 


±1.2 


1.91 


94.1 


5 


86.54 


±1.5 


1 


Soybean 


90.92 


±3.5 


0.65 


91.07 


±2.9 


8.42 


95.9 


13 


90.92 


±3.5 


1 


Vehicle 


70.06 


±3.0 


1.12 


69.97 


±2.9 


13.5 


89.9 


13 


70.06 


±3.0 


1 


Voting 


92.18 


±1.6 


0.07 


92.64 


±1.3 


1.18 


95.1 


3 


93.56 


±1.0 


5 


Vowel 


99.39 


±0.9 


1.19 


98.79 


±1.3 


16.1 


99.1 


1 


99.39 


±0.9 


1 


Wine 


96.07 


±1.1 


0.03 


96.63 


±0.7 


0.54 


98.1 


5 


97.75 


±0.7 


31 


Zoo* 


97.03 


±0.6 


0.01 


93.07 


±1.1 


0.19 


95.6 


1 


97.03 


±0.6 


1 


Average 


81.67 ±2.2 


0.40 


83.53 ±1.8 


5.39 


90.11 


9 


84.29 ±2.0 


11 



mains where classification is very sensitive to the parameter k by using the k-NN 
algorithm. For these input data, we could summarize several aspects: 

— Without the need of parameter, fNN is a reduction and classification tech- 
nique that keeps the average accuracy of the k-NN algorithm. 

— kLim and the size of Ty compared to the size of T are an approximated 
indicator for the percentage of examples that cannot be correctly classified 
by the k-NN algorithm. 

— The reduction of the database is very similar to the reduction that makes 
CNN [15], so that fNN is less restrictive than CNN. With large databases, 
this reduction can accelerate the learning process for the k-NN algorithm. 




Non-parametric Nearest Neighbor with Local Adaptation 



29 



5 Future Work 

Actually we are testing fNN with other classifiers. Particularly, we have chosen 
two systems, C4.5 [9] and HIDER [10], which generate decision trees and axis- 
parallel decision rules, respectively. Due to fNN makes a previous reduction, 
we have chosen the method EOP [1], which reduces databases conserving the 
decision boundaries that are parallel to the axis. 
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Abstract. This paper describes an automatic clustering strategy for ac- 
quiring selection restrictions. We use a knowledge-poor method merely 
based on word cooccurrence within basic syntactic constructions; hence, 
neither semantic tagged corpora nor man-made lexical resources are 
needed for generalising semantic restrictions. Our strategy relies on two 
basic linguistic assumptions. First, we assume that two syntactically re- 
lated words impose semantic selectional restrictions to each other (co- 
specification). Second, it is also claimed that two syntactic contexts im- 
pose the same selection restrictions if they cooccur with the same words 
(contextual hypothesis). In order to test our learning method, preliminary 
experiments have been performed on a Portuguese corpus. 



1 Introduction 

The general aim of this paper is to describe a particular corpus-based method 
for semantic information extraction. More precisely, we implement a knowledge- 
poor system that uses syntactic information to acquire selection restrictions and 
semantic preferences constraining word combination. 

According to Gregory Grefenstette [9,10], knowledge-poor approaches use no 
presupposed semantic knowledge for automatically extracting semantic informa- 
tion. They are characterised as follows: no domain-specific information is avail- 
able, no semantic tagging is used, and no static sources as machine readable dic- 
tionaries or handcrafted thesauri are required. Hence, they differ from knowledge- 
rich approaches in the amount of linguistic knowledge they need to activate the 
semantic acquisition process. Whereas knowledge-rich approaches require previ- 
ously encoded semantic information (semantic tagged corpora and/or man-made 
lexical resources [17,6,1]), knowledge-poor methods only need a coarse-grained 
notion of linguistic information: word cooccurrence. In particular, the main aim 
of knowledge-poor approaches is to calculate the frequency of word cooccur- 
rences within either syntactic constructions or sequences of n-grams in order to 
extract semantic information such as selection restrictions [19,11,3], and word 
ontologies [12,15,9,13]. Since these methods do not require previously defined 
semantic knowledge, they overcome the well-known drawbacks associated with 
handcrafted thesauri and supervised strategies. 

* Research supported by the PRAXIS XXI project, FCT/MCT, Portugal 
** Research sponsored by CAPES and PUCRS - Brazil 
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Nevertheless, our method differs from standard knowledge-poor strategies on 
two specific issues: both the way of extracting word similarity and the way of 
defining syntactic contexts. We use the contextual hypothesis to characterise 
word similarity and the co-specification hypothesis to define syntactic contexts. 



1.1 The Contextual Hypothesis 

In most knowledge-poor approaches to selection restriction learning, the process 
of inducing and generalising semantic information from word cooccurrence fre- 
quencies consists in automatically clustering words considered as similar. The 
best-known strategy for measuring word similarity is based on Harris’ distribu- 
tional hypothesis. According to this assumption, words cooccurring in similar 
syntactic contexts are semantically similar and, then, should be clustered into 
the same semantic class. However, the learning methods based on the distribu- 
tional hypothesis may lead to cluster in the same class words that fill different 
selection restrictions. Let’s analyse the following examples taken from [20]: 

(a) John worked till late at the council 

(b) John worked till late at the office 

(c) the council stated that they would raise taxes 

(d) the mayor stated that he would raise taxes 

On the basis of the distributional hypothesis, since council behaves similarly to 
office and mayor they would be clustered together into the same word class. 
Nevertheless, the bases for the similarity between council and office are differ- 
ent from those relating council and mayor. Whereas council shares with office 
syntactic contexts associated mainly with LOCATIONS (e.g., the argument of 
work at in phrases (a) and (b)), council shares with mayor contexts associated 
with AGENTS (e.g., the subject of state in phrases (c) and (d)). That means 
that a polysemous word like council should be clustered into various semantic 
word classes, according to its heterogeneous syntactic distribution. Each partic- 
ular sense of the word is related to a specific type of distribution. Given that 
the clustering methods based on the distributional hypothesis solely take into 
account the global distribution of a word, they are not able to separate and 
acquire its different contextual senses. 

In order to extract contextual word classes from the appropriate syntactic 
constructions, we claim that similar syntactic contexts share the same semantic 
restrictions on words. Instead of computing word similarity on the basis of the 
too coarse-grained distributional hypothesis, we measure the similarity between 
syntactic contexts in order to identify common selection restrictions. More pre- 
cisely, we assume that two syntactic contexts occurring with (almost) the same 
words are similar and, then, impose the same semantic restrictions on those 
words. That is what we call contextual hypothesis. Semantic extraction strate- 
gies based on the contextual hypothesis may account for the semantic variance 
of words in different syntactic contexts. Since these strategies are concerned with 
the extraction of semantic similarities between syntactic contexts, words will be 
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clustered with regard to their specific syntactic distribution. Such clusters repre- 
sent context-dependent semantic classes. Except the cooperative system Asium 
introduced in [4,5,2], few or no research on semantic extraction have been based 
on such a hypothesis. 

1.2 Co-specification 

Traditionally, a binary syntactic relationship is constituted by both the word 
that imposes linguistic constraints (the predicate) and the word that must fill 
such constraints (its argument). In a syntactic relationship, each word plays a 
fixed role. The argument is perceived as the word specifying or modifying the 
syntactic-semantic constraints imposed by predicate, while the latter is viewed as 
the word specified or modified by the former. However, recent linguistic research 
assumes that the two w ords related by a syntactic dependency are mutually 
specified [16,8]. Each word imposes semantic conditions on the other word of the 
dependency, and each word elaborates them. Consider the relationship between 
the polysemic verb load and the polysemic noun books in the non ambiguous 
expression to load the books. On the one hand, the polysemic verb load conveys 
at least two alternate meanings: “bringing something to a location” (e.g., Ann 
loaded the hay onto the truck), and “modifying a location with something” (e.g., 
Ann loaded the truck with the hay). This verb is disambiguated by taking into 
account the sense of the words with which it combines within the sentence. On 
the other hand, the noun book(s) is also a polysemic expression. Indeed, it refers 
to different types of entities: “physical objects” {rectangular book), and “sym- 
bolic entities” {interesting book). Yet, the constraints imposed by the words with 
which it combines allows the noun to be disambiguated. Whereas the adjective 
rectangular activates the physical sense of book, the adjective interesting makes 
reference to its symbolic content. 

In to load the books, the verb load activates the physical sense of the noun, 
while books leads load to refer to the event of bringing something to a loca- 
tion. The interpretation of the complex expression is no more ambiguous. Both 
expressions, load and books, cooperate to mutually restrict their meaning. The 
process of mutual restriction between two related words is called by Pustejovsky 
“co-specification” or “co-composition” [16]. Co-specification is based on the fol- 
lowing idea. Two syntactically dependent expressions are no longer interpreted 
as a standard pair “predicate-argument” , where the predicate is the active func- 
tion imposing the semantic preferences on a passive argument, which matches 
such preferences. On the contrary, each word of a binary dependency is perceived 
simultaneously as a predicate and an argument. That is, each word both imposes 
semantic restrictions and matches semantic requirements. When one word is in- 
terpreted as an active functor, the other is perceived as a passive argument, and 
conversely. Both dependent expressions are simultaneously active and passive 
compositional terms. Unlike most work on selection restrictions learning, our 
notion of “predicate-argument” frame relies on the active process of semantic 
co-specification, and not on the trivial operation of argument specification. This 
trivial operation only permits the one-way specification and disambiguation of 
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the argument by taking into account the sense of the predicate. Specification 
and disambiguation of the predicate by the argument is not considered. 

In this paper, we describe a knowledge-poor unsupervised method for ac- 
quiring selection restrictions, which is based on the contextual and the co- 
specification hypotheses. The different parts of this method will be outlined 
in the next section. 



2 System Overview 

To evaluate the hypotheses presented above, a software package was developed 
to support the automatic acquisition of semantic restrictions. The system is 
constituted by four related modules, illustrated in Figure 1. In the following 
paragraphs we merely outline the overall functionalities of these modules. Then, 
in the remainder of the paper, we describe accurately the specific objects and 
processes of each module. 

Parsing: The raw text is tagged [14] and partially analysed [18]. Then, an 
attachment heuristic is used to identify binary dependencies. The result is 
a list of cooccurrence triplets containing the syntactic relationship and the 
lemmas of the two related head words. This module will be described in 
section 3.1 

Extracting: The binary dependencies are used to extract the syntactic contexts. 
Unlike most work on selection restrictions learning, the characterisation of 
syntactic contexts relies on the dynamic process of co-specification. Then, 
the word sets that appear in those contexts are also extracted. The result is 
a list of contextual word sets. This module will be analysed in section 3.2. 
Filtering: Each pair of contextual word sets are statistically compared using a 
variation of the weighted Jaccard Measure [9[. For each pair of contextual 
sets considered as similar, we select only the words that they share. The 
result is a list of semantically homogenous word sets, called basic classes. 
Section 4.1 describes this module. 

Clustering: Basic classes are successively aggregated by a conceptual cluster- 
ing method to induce more general classes, which represent extensionally 
the selection restrictions of syntactic contexts. We present this module in 
section 4.2. Finally, the classes obtained by clustering are used to update 
the subcategorisation information in the dictionary.^ 

The system was tested over the Portuguese text corpora P.G.R.^. Some re- 
sults are analysed in section 4.3. The fact of using specialised text corpora makes 

^ The first tasks, tagging and parsing, are based on a non domain-specific dictionary for 
Portuguese, which merely contains morphosyntactic information. The subcategori- 
sation restrictions extracted by our method are used to extend the morphosyntactic 
information of the dictionary. 

^ P.G.R. {Portuguese General Attorney Opinions) is constituted by case-law docu- 
ments. 
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Tagging and Shallow Parsing 

) 

Attachrnent heuristics: 
identifying binary dependencies 

I PARSING 

extraction of syntactic contexts 

I 

extraction of contextual word sets 

I EXTRACTING 

identifying similar contextual sets 

extraction of basic classes 
hy intersecting similar contextual sets 

] FILTERING 

calculating the weigth of basic classes 



huilding new classes at level n 

j CLUSTERING 

Dictionary update 

Fig. 1. System modules 



easier the learning task, given that we have to deal with a limited vocabulary 
with reduced polysemy. Furthermore, since the system is not dependent of a 
specific language such as Portuguese, it could be applied to whatever natural 
language. 

3 Identification of Binary Dependencies and Extraction 
of Syntactic Contexts 

Binary dependencies and syntactic contexts are directly associated with the no- 
tion of selection restrictions. Selection restrictions are the semantic constraints 
that a word needs to match in order to be syntactically dependent and attached 
to another word. According to the co-specification hypothesis, two dependent 
words can be analysed as two syntactic contexts of specification. So, before de- 
scribing how selection restrictions are learned, we start by defining first how 
binary dependencies are identified, and second how syntactic contexts are ex- 
tracted from binary dependencies. 



3.1 Binary Dependencies 

We assume that basic syntactic contexts are extracted from binary syntactic 
dependencies. We use both a shallow syntactic parser and a particular attach- 
ment heuristic to identify binary dependencies. The parser produces a single 
partial syntactic description of sentences, which are analysed as sequences of 
basic chunks (NP, PP, VP, . . . ). Then, attachment is temporarily resolved by a 
simple heuristic based on right association (a chunk tend to attach to another 
chunk immediately to its right). Finally, we consider that the word heads of two 
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attached chunks form a binary dependency. It can be easily seen that syntactic 
errors may appear since the attachment heuristic does not take into account 
distant dependencies.^ For reasons of attachment errors, it is argued here that 
the binary dependencies identified by our basic heuristic are mere hypotheses on 
attachment; hence they are mere candidate dependencies. Candidate dependen- 
cies will be checked by taking into account further information on the attached 
words. In particular, a candidate dependency will be checked and finally verified 
only if the two attached words impose selection restrictions to each other. There- 
fore, the test confirming or not a particular attachment relies on the semantic 
information associated with the related words. Let’s describe first the internal 
structure of a candidate dependency between two words. 

A candidate syntactic dependency consists of two words and the hypothetical 
grammatical relationship between them. We represent a dependency as the fol- 
lowing binary predication: ,w2^). This binary predication is constituted 

by the following entities: 

— the binary predicate r, wich can be associated to specific prepositions, sub- 
ject relations, direct object relations, etc.; 

— the roles of the predicate, and which represent the head and com- 
plement roles, respectively; 

— the two words holding the binary relation: wl and w2. 

Binary dependencies denote grammatical relationships between the head and 
its complement. The word indexed by plays the role of head, whereas the 
word indexed by plays the role of complement. Therefore, is perceived as 
the head and w2 as the complement. 

Furthermore, the binary dependencies (i.e., grammatical relationships) we 
have considered are the following: subject (noted subj), direct object (noted 
dobj), prepositional object of verbs, and prepositional object of nouns, both 
noted by the specific preposition. 

3.2 Extraction of Syntactic Contexts and Co-specification 

Syntactic contexts are abstract configurations of specific binary dependencies. 
We use A-abstraction to represent the extraction of syntactic contexts. A syntac- 
tic context is extracted by A-abstracting one of the related words of a binary de- 
pendency. Thus, two complementary syntactic context can be A-abstracted from 
the binary predication associated with a syntactic dependency: [Aa;^(r; w2l)] 

and [Ax^(r; x^)]. 

The syntactic context of word w2, [Ax^(r; x^, ru2l)], can be defined exten- 
sionally as the set of words that are the head of w2. The exhaustive enumeration 
of every word that can occur with that syntactic frame enables us to characterise 
extensionally its selection restrictions. Similarly, The syntactic context of word 

® The errors are caused, not only by the too restrictive attachment heuristic, but also 
by further misleadings, e.g., words missing from the dictionary, words incorrectly 
tagged, other sorts of parser limitations, etc. 
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wl, [Aa;^(r; x^)], represents the set of words that are complement of wl. 

This set is perceived as the extensional definition of the selection restrictions 
imposed by the syntactic context. Consider Table 1. The left column contains 
expressions constituted by two words syntactically related by a particular type 
of syntactic dependency. The right column contains the syntactic contexts ex- 
tracted from these expressions. For instance, from the expression presidente 
da repiiblica {president of the republic), we extract two syntactic contexts: 
both [Xx^ {de; ,republica^)], where repiiblica plays the role of complement, 
and [Xx^ {de; presidente^ , x^)], where presidente is the head. 



Table 1. Syntactic contexts extracted from binary expressions 



Binary Expressions 


Syntactic Contexts 


presidente da repiiblica 
{president of the republic) 


[Ax"^ (de;x‘^ , repiiblica'^ )] , [Ax^ {de-, presidente^ )] 


nomeagao do presidente 
{nomination for president) 


[Ax-^ {de;x^ , presidente^ )] , [Ax ^ {de;nomeaQ&o^ )] 


nomeou o presidente 
{nominated the president) 


[Ax-^ {dobj-,x^ , presidente^ )] , [Ax^ {dobj-,nomear^ )] 


discutiu sobre a nomeagao 

{disscussed about the nomination) 


[Ax-^ {sobre]x^ ,nomeagdo^ )] , [Ax^ {sobre;discutir^ ^ )] 



Since syntactic configurations impose specific selectional preferences on words, 
the words that match the semantic preferences (or selection restrictions) required 
by a syntactic context should constitute a semantically homogeneous word class. 
Consider the two contexts extracted from presidente da repiiblica. On the 
one hand, context [Xx^ {de; presidente^ ,x^)] requires a particular noun class, 
namely human organizations. In corpus P.G.R., this syntactic context selects for 
nouns such as repiiblica {republic), governo {government), institute {insti- 
tute), conselho {council),. .. On the other hand, context [Xx^{r;x^ ,republica^)] 
requires nouns denoting either human beings or organizations: presidente {pres- 
ident), ministro {minister of state), assembleia {assembly), governo, {gov- 
ernment) procurador {attorney), procuradoria-geral {attorneyship) , min- 
isterio {state department), etc. 

It follows that the two words related by a syntactic dependency are mutually 
determined. The context constituted by a word and a specific function imposes 
semantic conditions on the other word of the dependency. The converse is also 
true. As has been said, the process of mutual restriction between two related 
words is called co-specification. In presidente da repiiblica, the context con- 
stituted by the noun presidente and the grammatical function head somehow 
restricts the sense of repiiblica. Conversely, both the noun repiiblica and the 
role of complement also restrict the sense of presidente: 

— [Xx^ {de; x^ , republicaf)] selects for presidente 

— [Xx^ {de; presidente^ ,x^)] selects for repiiblica 
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In our system, the extraction module consists of the two following tasks: 
first, the syntactic contexts associated with all candidate binary dependencies 
are extracted. Then, the set of words appearing in those syntactic contexts are 
selected. The words appearing in a particular syntactic context form a contextual 
word set. Contextual word sets are taken as the input of the processes of filtering 
and clustering; these processes will be described in the next section. 

Let’s note finally that unlike the Grefenstette’s approach [9], information on 
co-specification is available for the characterisation of syntactic contexts. In [7], 
a strategy for measuring word similarity based on the co-specification hypothesis 
was compared to the Grefensetette’s strategy. Experimental tests demonstrated 
that co-specification allows a finer-grained characterisation of syntactic contexts. 

4 Filtering and Clnstering 

According to the contextual hypothesis introduced above, two syntactic contexts 
that select for the same words should have the same extensional definition and, 
then, the same selection restrictions. So, if two contextual word sets are consid- 
ered as similar, we infer that their associated syntactic contexts are semantically 
similar and share the same selection restrictions. In addition, we also infer that 
these contextual word sets are semantically homogeneous and represent a contex- 
tually determined class of words. Let’s take the two following syntactic contexts 
and their associated contextual word sets: 

Xx^ {of] infringement^ ,x^)] = {article law norm precept statute . . .} 

\x'' [dobj] in fringe^ = [article law norm principle right . . .} 

Since both contexts share a significant number of words, it can be argued that 
they share the same selection restrictions. Furthermore, it can be inferred that 
their associated contextual sets represent the same context-dependent semantic 
class. In our corpus, context \\x^ {dohj]violar^ ^x^)] {to infringe) is not only 
considered as similar to context \Xx^{dobj]violagdo^ ,x^)] {infringemen t of), 
but also to other contexts such as: [Xx^ {dobj; respeitar^ , x^)] {to respect) and 
[Xx^ {dobj]aplicar^ ,x^)] {to apply). 

In this section, we will specify the procedure for learning context-dependent 
semantic classes from the previously extracted contextual sets. This will be done 
in two steps: 

— Filtering: word sets are automatically cleaned by removing those words that 
are not semantically homogenous. 

— Gonceptual clustering: the previously cleaned sets are successively aggre- 
gated into more general clusters. This allows us to build more abstract se- 
mantic classes and, then, to induce more general selection restrictions. 

4.1 Filtering 

As has been said in the introduction, the cooperative system Asium is also based 
on the contextual hypothesis [4,5]. This system requires the interactive partic- 
ipation of a language specialist in order to filter and clean the word sets when 
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they are taken as input of the clustering strategy. Such a cooperative method 
proposes to manually remove from the sets those words that have been incor- 
rectly tagged or analysed. Our strategy, by contrast, intends to automatically 
remove incorrect words from sets. Automatic filtering consists of the following 
subtasks: 

First, each word set is associated with a list of its most similar sets. In- 
tuitively, two sets are considered as similar if they share a significant number 
of words. Various similarity measure coefficients were tested to create lists of 
similar sets. The best results were achieved using a particular weighted version 
of the Jaccard coefficient, where words are weighted considering their dispersion 
(global weight) and their relative frequency for each context (local weight). Word 
dispersion (global weight) disp takes into account how many different contexts 
are associated with a given word and the word frequency in the corpus. The 
local weight is calculated by the relative frequency fr of the pair word/context. 
The weight of a word with a context cntx is computed by the following formula: 

W{wordi,cntxj) = log2{frij) * log2{dispi) 



where 



and 



f'r^j 



frequency of wordi with cntxj 
sum of frequencies of words occurring in cntxj 



dispi 



frequency of wordi with cntxj 
number of contexts with wordi 



So, the weighted Jaccard similarity WJ between two contexts m and n is com- 
puted by"^: 



W J{ cntXm , cntx n) 



wordi) + W {cntXn,WOrdi)) 
{cntXm, wordj) + W {cntxn, wordj)) 



Then, once each contextual set has been compared to the other sets, we select 
the words shared by each pair of similar sets, i.e., we select the intersection be- 
tween each pair of sets considered as similar. Since words that are not shared by 
two similar sets could be incorrect words, we remove them. Intersection allows 
us to clear sets of words that are not semantically homogenous. Thus, the inter- 
section of two similar sets represents a semantically homogeneous class, which 
we call basic class. Let’s take an example. In our corpus, the most similar set to 
[Xx^ {de;violaqao^ ,x^)] {infringement of)) is [Xx^ {dobj;violarfx^)] {infringe) . 
Both sets share the following words: 



sigilo principios preceito piano norma lei estatuto disposto 

disposigao direito conven^ao artigo 

(secret principle precept plan norm law statute rule precept 

right convention article) 

common means that just common words to both contexts m and n are computed 
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This basic class does not contain incorret words such as vez, flagrante- 
mente, obrigagao, interesse {time, notoriously, obligation, interest) , which 
were oddly associated to the context [Ax^(de; violaqa,o^ ,x^)], but which does not 
appear in context [Xx^ {dobj;violar^ ,x^)]. This class seems to be semantically 
homogenous because it contains only words referring to legal documents. Once 
basic classes have been created, they are used by the conceptual clustering al- 
gorithm to build more general classes. Note that this strategy do not remove 
neither infrequent nor very frequent words. Frequent and infrequent words may 
be semantic significant provided that they occur with similar syntactic contexts. 

4.2 Conceptual Clustering 

We use an agglomerative (bottom-up) clustering for successivelly aggregating the 
previously created basic classes. Unlike most research on conceptual clustering, 
aggregation does not rely on a statistical distance between classes, but on em- 
pirically set conditions and constraints [21]. These conditions will be discussed 
below. Figure 2 shows two basic classes associated with two pairs of similar syn- 
tactic contexts. [CONTXi] represents a pair of syntactic contexts sharing the 
words preceito, lei, norma {precept, law, norm, and [CONTXj] represents a 
pair of syntactic contexts sharing the words preceito, lei, direito {precept, 
law, right). Both basic classes are obtained from the filtering process described 
in the previous section. Figure 3 illustrates how basic classes are aggregated into 
more general clusters. If two classes fill the conditions that we will define later, 
they can be merged into a new class. The two basic classes of the example are 
clustered into the more general class constituted by preceito, lei, norma, 
direito. Such a generalisation leads us to induce syntactic data that does not 
appear in the corpus. Indeed, we induce both that the word norma may appear 
in the syntactic contexts represented by [CONTXj], and that the word direito 
may be attached to the syntactic contexts represented by [CONTXi], 




[CONTXi] [CONTXy] 



norma (fjireceito JeTU direito 

Fig. 3. Agglomerative cluster- 
ing 



preceito lei norma preceito lei direito 




[CONTXi] [CONTX;] 

Fig. 2. Basic classes 



Two basic classes are compared and then aggregated into a new more general 
class if they fulfil three specific conditions: 
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1. They must have the same number of words. We consider that two classes are 
compared in a more efficient manner when they have the same number of 
elements. Indeed, nonsensical results could be obtained if we compare large 
classes, which still remain polysemic and then heterogeneous, to the small 
classes that are included in them. 

2. They must share n — 1 words. Two classes sharing n — 1 words are aggregated 
into a new class of n + 1 members. Indeed, two classes with the same number 
of elements only differing in one word may be considered as semantically 
close. 

3. They must have the highest weight. The weight of a class corresponds to the 
number of occurrences of the class as a subset of other classes (within n + 20 
supersets). Intuitively, the more a class is included in larger classes, the more 
semantically homogeneous it should be. Only those classes with the highest 
weight will be compared and aggregated. 

Note that clustering does not rely here on a statistical distance between 
classes. Rather, clustering is guided by a set of constraints, which have been 
empirically defined considering linguistic data. Due to the nature of these con- 
straints, the clustering process should start with small size classes with n el- 
ements, in order to create larger classes of n -I- 1 members. All classes of size 
n that fulfil the conditions stated above are aggregated into n -I- 1 clusters. In 
this agglomerative clustering strategy, level n is defined by the classes with n 
elements. The algorithm continues merging clusters at more complex levels and 
stops when there are no more clusters fulfilling the three conditions. 

4.3 Tests and Evaluation 

We used a small corpus with 1,643,579 word occurrences, selected from the case- 
law P.G.R. text corpora. First, the corpus was tagged by the part-of- speech 
tagger presented in [14]. Then, it was analysed in sequences of basic chunks 
by the partial parser presented in [18]. The chunks were attached using the 
right association heuristic so as to create binary dependencies. 211,976 different 
syntactic contexts with their associated word sets were extracted from these 
dependencies. Then, we filter these contextual word sets by using the method 
described above so as to obtain a list of basic classes. 

In order to test our clustering strategy, we start the algorithm with basic 
classes of size 4 (i.e., classes with 4 elements). We have 7,571 basic classes with 
4 elements, but only a small part of them fills the clustering conditions so as 
to form 1 , 243 clusters with 5 elements. At level 7, there are still 600 classes 
filling the clustering conditions, 263 at level 9, 112 at level 11, 38 at level 13, 
and finally only 1 at level 19. In table 2, we show some of the clusters generated 
by the algorithm at different intermediate levels.® 

Note that some words may appear in different clusters. For instance, cargo 
{task/post) is associated with nouns referring to activities (e.g., actividade, 

® In the left column, the first nnmber represents the weight of the set, i.e., its occur- 
rences as subset of larger supersets; the second number represents class cardinality. 
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Table 2. Some clusters at levels 6, 7, 8, 9, 10, and 11 



0006 (06) 


aludir citar eimnciar indicar mencionar referir 
allude cite enunciate indicate mention refer 


0009 (07) 


considerar constituir criar definir determinar integreir referir 

consider constitute create define determinate integrate refer 


0002 (07) 


actividade atribuigao cargo fungao fungoes tarefa trabalho 
activity attribution position/task function functions task work 


0003 (08) 


administragao cargo categoria exercicio fungao lugar regime servigo 

administration post rank practice function place regime service 


0002 (09) 


abono indemnizagao multa pensao propina remuneragao renda sangao vencimento 
bail compensation fine pension fee remuneration rent sanction salary 


0007 (10) 


administragao autoridade comissao conselho direcgao estado governo ministro 
tribunal orgao 

administration authority commission council direction state government minis- 
ter tribunal organ 


0026 (11) 


alinea artigo codigo decreto diploma disposigao estatuto legislagao lei 
norma regulamento 

paragraph article code decret diploma disposition statute legislation law norm 
regulation 



trabalho, tarefa {activity, work, task)), as well as with nouns referring to the 
positions where those activities are produced (e.g., cargo, categoria, lugar 
{post, rank, place)). The sense of polysemic words is represented by the natural 
attribution of a word to various clusters. 



Table 3. Some clusters generated at level 12 



0002 (12) 


administragao associagao autoridade comissao conselho direcgao entidade 
estado governo ministro tribunal orgao 

administration association authority commission council direction entity state 
government minister government tribunal organ 


0002 (12) 


administragao assembleia autoridade comissao conselho direcgao director 
estado governo ministro tribunal orgao 

administration assembly authority commission council direction director state 
government minister government tribunal organ 


0002 (12) 


assembleia autoridade camara comissao direcgao estado europol governo 
ministerio pessoa servigo orgao 

assembly authority chamber commission direction state europol government 
state-department person service organ 


0002 (12) 


administragao autoridade comissao conselho direcgao empresa estado gestao 
governo ministerio servigo orgao 

administration authority commission council direction firm state management 
gouvernment state-department person service organ 



Since this clustering strategy have been conceived to assure the semantic 
homogeneity of clusters, it does not really assure that each cluster represents 
an independent semantic class. Hence, two or more clusters can represent the 
same contextual-based class and, then, the same semantic restriction. Let’s see 
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Table 3. Intuitively, the four clusters, which have been generated at level 12, 
refer to a general semantic class: agentive-entities. Yet, the algorithm is not able 
to aggregate them into a more general cluster at level 13, since they do not fill 
condition 2. Further work should refine the algorithm in order to solve such a 
problem. 

Note that the algorithm does not generate ontological classes like human 
beings, institutions, vegetables, dogs,. . . but context-based semantic classes asso- 
ciated with syntactic contexts. Indeed, the generated clusters are not linguistic- 
independent objects but semantic restrictions taking part in the syntactic anal- 
ysis of sentences. This way, the words autoridade, pessoa, administragao, 
etc. {authority, person, administration) belong to the same contextual class be- 
cause they share a great number of syntactic contexts, namely they appear as the 
subject of verbs such as aprovar, revogar, considerar, . . . {approve, repeal, 
consider). Those nouns do not form an ontological class but rather a linguistic 
class used to constrain the syntactic word combination. More precisely, we may 
infer that the following syntactic contexts: 

Xx^ {subj; aprovar^,x^) 

{subj; revogar^ ,x^) 

{subj; considerar^ ,x^)] 

share the same selection restrictions since they are used to build a context- 
based semantic class constituted by words like autoridade, pessoa, adminis- 
tragao, . . . As has been said, the acquired selection restrictions should be used 
to check the attachment hypotheses concerning the candidate dependencies pre- 
viously extracted. 

5 Conclusion and Further Work 

This paper has presented a particular unsupervised strategy to automatically 
learn context-based semantic classes used as restrictions on syntactic combina- 
tions. The strategy is mainly based on two linguistic assumptions: co-specification 
hypothesis, i.e., the two related expressions in a binary dependency impose se- 
mantic restrictions to each other, and contextual hypothesis, i.e., two syntactic 
contexts share the same semantic restrictions if they cooccur with the same 
words. 

The system introduced in this paper will be applied to attachment resolution 
in syntactic analysis. The degree of efficacy in such a task may serve as a reliable 
evaluation for measuring the soundness of our learning strategy. This is part of 
our current work. 
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Abstract. This paper presents the APRIORI-C algorithm, modifying 
the association rule learner APRIORI to learn classification rules. The 
algorithm achieves decreased time and space complexity, while still per- 
forming exhaustive search of the rule space. Other APRIORI-C improve- 
ments include feature subset selection and rule post-processing, leading 
to increased understandability of rules and increased accuracy in domains 
with unbalanced class distributions. In comparison with learners which 
use the covering approach, APRIORI-C is better suited for knowledge 
discovery since each APRIORI-C rule has high support and confidence. 



1 Introduction 

Mining of association rules has received a lot of attention in recent years. Com- 
pared to other machine learning techniques, its main advantage is a low number 
of database passes done when searching the hypothesis space, whereas its main 
disadvantage is time complexity. 

One of the best known association rule learning algorithms is APRIORI 
[3,4] . This algorithm was extensively studied, adapted to other areas of machine 
learning and data mining, and successfully applied in many problem domains 
[5,15,1,14,2]. 

An association rule R has the form X ^ Y, for X,Y Q I, where I is a 
set of all items^, and X and Y are itemsets. If freq{X) denotes the number of 
transactions that are supersets of itemset X, and N the number of all transac- 
tions, then Support(X) = . Each rule is associated with its confidence 

and support: Confidence{R) = Support(R) = 

This paper presents our algorithm APRIORI-C, based on APRIORI, whose 
modifications enable it to be used for classification. The basic APRIORI-C algo- 
rithm is described in Section 2. Given that the idea of using association rules for 
classification is not new [13], the main contribution of this paper are substantially 
decreased memory consumption and time complexity (described in Section 2), 
further decreased time-complexity by feature subset selection (Section 3), and 
improved understandability of results by rule post-processing (Section 4). 

^ In machine learning terminology, an item is a binary feature. In association rule 
learning, a binary attribute Ai = Vj is generated for each value Vj of a discrete 
attribute Aj. For numeric attributes, items are formed by attribute discretization. 
Throughout this paper, items and features are used as synonyms. 
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2 The Core of APRIORI-C 

Association rule learning can be adapted for classification purposes by imple- 
menting the following steps: 

1. Discretize continuous attributes. 

2. For each discrete attribute with N values create N items: for a discrete value 
Vi, i-th item of this attribute is assigned value 1, others are assigned value 
0. In the case of a missing value, all items are assigned value 0. 

3. Run an association rule learning algorithm. 

4. Collect rules whose right-hand side consist of a single item, representing a 
value of the target attribute. 

5. Use this set of rules to classify unclassified examples. 

To classify an unclassified example with all the rules found by the algorithm, 
first, sort rules according to some criteria, next, go through the list of all rules 
until the first rule that covers the example is found, and finally, classify the 
example according to the class at the right-hand side of this rule. If no rule 
covers the example, mark the example as unclassified. 

The above modifications are straightforward. Nevertheless, to better adapt 
the algorithm to classification purposes, APRIORI-C includes the following op- 
timizations: 

Classification rule generation Rules with a single target item at the right- 
hand side can be created during the search. To do so, the algorithm needs 
to save only the supported itemsets of sizes k and A: -I- 1. This results in 
decreased memory consumption (improved by factor 10). Notice, however, 
that this does not improve the algorithm’s time complexity. 

Prune irrelevant rules Classification rule generation can be supressed if one 
of the existing generalizations of the rule has support and confidence above 
the given thresholds. To prevent rule generation, the algorithm simply ex- 
cludes the corresponding itemset from the set of supported fc+f-itemsets. 
Time and space complexity reduction are considerable (improved by factor 
10 or more). 

Prune irrelevant items If an item cannot be found in any of the itemsets 
containing the target item, then it is impossible to create a rule containing 
this item. Hence, APRIORI-C prunes the search by discarding all itemsets 
containing this item. 

The above optimizations assume that we want to find only rules which have 
a single item, called a target item, at their right-hand side. Consequently, one 
has to run the algorithm for each target item, representing each individual class. 
The algorithm, named APRIORI-C, is outlined in Figure 1. 

APRIORI-C was evaluated on 17 datasets from the UCI ML databases reposi- 
tory {ftp://ftp.ics.uci.edu/pub/machine-learning-datahases/). Most datasets con- 
tain continuous attributes which had to be discretized. This was done by using 
the K-Means clustering algorithm [16,7], which was an arbitrary choice. 
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for each target item T 
k = 1 

Cl = set of all 1-itemsets 
check the support of all itemsets in Ci 
delete unsupported itemsets from Ci 
while Ck not empty do 

build all potentially supported k+ 1-itemsets 
put them into Cfc+i 
check their support 

delete unsupported itemsets from Ck+i 

find all rules that have target item T at their right-hand side 
delete all itemsets that were used for constructing rules 
detect irrelevant items and delete irrelevant itemsets 
k = k + 1 

end while 
endfor 

Fig. 1. The APRIORTC algorithm. 

In the experiments, MinConfidence was set to 0.9, MinSupport to 0.03 and 
MaxRuleLength to 5. This setting works well for most of the datasets. For few 
other datasets, MinSupport had to be set to a higher value, e.g., 0.05 or even 
0.1, to prevent combinatorial explosion. In contrast, for some other datasets the 
algorithm didn’t find any rule, hence MinSupport was set to 0.01. Using 10-fold 
cross-validation, the accuracy of APRIORI-C was compared to the accuracies 
of other machine learning algorithms (results reported in [18], using default pa- 
rameter settings; notice that the accuracy of these algorithms could be further 
improved by parameter tuning). Results in Table 1 show that in datasets with 
few or no continuous attributes, APRIORI-C is at least as good as other al- 
gorithms, sometimes even slightly better {glass, tic-tac-toe and vote). However, 
APRIORI-C performs poorly on the datasets containing continuous attributes 
(e.g., balance, waveform and wine). Except for APRIORI-C, all the other algo- 
rithms handle continuous attributes by themselves and not in pre-processing. 
The poor results can be mainly attributed to the suboptimal discretization cur- 
rently used in APRIORI-C. 



3 Pre-processing by Feature Subset Selection 

The most serious problem of APRIORI-C is that increased number of features 
results in an exponential increase in time complexity. Attribute correlations can 
also seriously affect the performance and accuracy. To deal with these problems, 
feature subset selection was employed to select a feature subset, sufficient for 
the learner to construct a classifier of similar accuracy in shorter time. The 
following four approaches were tested: statistical correlation, odds ratio, and two 
variants of the RELIEF algorithm [11,17,12]. Experiments showed that in the 
association rule learning context, feature subset selection needs to be sensitive 
to the difference in the meaning of attribute values 0 and 1: while value 1 is 
favorable for rule construction, value 0 is not used in a constructed rule, it just 
prohibits rule construction. 
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Table 1. Results of APRIORI-C compared to other algorithms. 



Dataset 


C4.5 


CN2 


kNN 


Ltree 


LBayes 


APRIORI-C 


australian 


0.85 


0.83 


0.85 


0.87 


0.86 


0.86 


balance 


0.77 


0.81 


0.80 


0.93 


0.87 


0.72 


breast-w 


0.94 


0.94 


0.95 


0.94 


0.97 


0.92 


bridges-td 


0.85 


0.85 


0.85 


0.85 


0.88 


0.83 


car 


0.92 


0.95 


0.94 


0.88 


0.86 


0.85 


diabetes 


0.74 


0.74 


0.73 


0.75 


0.77 


0.72 


echocardiogram 


0.65 


0.66 


0.69 


0.64 


0.71 


0.65 


german 


0.72 


0.73 


0.70 


0.73 


0.75 


0.70 


glass 


0.70 


0.65 


0.70 


0.68 


0.63 


0.77 


hepatitis 


0.79 


0.80 


0.85 


0.82 


0.85 


0.80 


hypothyroid 


0.99 


0.99 


0.97 


0.99 


0.96 


0.95 


image 


0.97 


0.86 


0.97 


0.97 


0.92 


0.77 


iris 


0.95 


0.93 


0.95 


0.97 


0.98 


0.96 


tic-tac-toe 


0.85 


0.98 


0.91 


0.82 


0.70 


0.99 


vote 


0.96 


0.95 


0.90 


0.95 


0.9 


0.96 


waveform 


0.76 


0.69 


0.81 


0.85 


0.86 


0.34 


wine 


0.94 


0.93 


0.97 


0.97 


0.99 


0.88 



Statistical Correlation In association rule learning, strong correlations among 
items cause drastic increase of the number of supported itemsets. Correlated 
items were removed by first testing all the pairs of non-target items and removing 
one of them if the correlation was above the user defined threshold MaxCorr. 
The algorithm complexity is O(n^) w.r.t. the number of items. 

Odds Ratio For target class Ci, union of all the non-target classes C 2 , and 
n training examples, Odds ratio is computed for each feature F as follows: 

OddsRatio(F) = where odds{F) equals yfV ii p{F) = 0; equals 

if p{F) = 1; and finally, equals if p(^) 7 ^ 0 and p(F) yf 1. Prob- 

abilities are estimated using the relative frequency. Features are then sorted in 
the descending order of odds ratio values and the number of items is selected 
w.r.t the RatioOfSelectedItems parameter. Given that Odds ratio distinguishes 
between attribute values 1 and 0, it performs very well (see also results in [17]). 
VRELIEF in VRELIEF2 In the RELIEF algorithm below [12], q is a list of 
quality estimations, SampleSize is the user-defined number of randomly selected 
examples, and drx,i is the distance between examples r and x for item 7^. 



for each item I do q[Ii\ = 0 
for j = 1 to Sample size do 

randomly select an example r 
find nearest hit t and nearest miss s 
for each item I do 



q[/i]:=q[/i]-|- 



^rt,i 

Samplesize 



sort items according to measure q 
select the best items 
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RELIEF is an excellent algorithm for feature subset selection, but in our 
problem it performs poorly since it makes no distinction between attribute values 
0 and 1. Consequently, the original RELIEF distance function was changed: 
VRELIEF assigns higher quality to an item with value 1 for examples of the same 
class. Despite the significant accuracy increase, the algorithm often selects items 
that cause unnecessary increase in time complexity. Hence, the distance function 
in VRELIEF2 was changed so that items that increase the complexity without 
increasing accuracy receive lower quality estimation. The distance computations 
for training examples, labeled a and b, and values of the tested items for these 
two examples, are shown below: 







RELIEF 




VRELIEF 


VRELIEF2 


Class 




valued / value;, 


valued / valuer 


valued / valuer 


membership 


1/1 


0/1 


1/0 


0/0 


1/1 


0/1 


1/0 


0/0 


1/1 


0/1 


1/0 


0/0 


Class(a)=-|- 


Class(b)=-1- 


0 


1 


1 


0 


1 


0 


0 


-1 


2 


0 


0 


-2 


Class(a)=-|- 


Class(b)=- 


0 


1 


1 


0 


0 


-1 


1 


0 


-1 


-2 


2 


0 


Class(a)=- 


Class(b)=-I- 


0 


1 


1 


0 


0 


1 


-1 


0 


-1 


2 


-2 


0 


Class(a)=- 


Class(b)=- 


0 


1 


1 


0 


-1 


0 


0 


1 


-2 


-1 


-1 


2 



Results Algorithms Odds Ratio, VRELIEF and VRELIEF2 were tested with values 
of parameter RatioOfSelectedItems set to 0.2, 0.4, 0.6 and 0.8. Algorithm Statistical 
Correlation was tested with values of parameter MaxCorr set to 0.2, 0.4, 0.6 and 0.8. 
For each value of the parameter, 10-fold cross-validation was performed and statistical 
t-test computed to see whether the best algorithm is significantly better than the 
others, both concerning the accuracy and complexity. Below is the summary of results: 

— The Odds Ratio algorithm is a good choice for nearly every value of the RatioOf- 
SelectedItems parameter. Other algorithms often come close, but never surpass it. 
The values around 0.4 of this parameter usually suffice for achieving the same 
accuracy as if no feature subset selection were performed. 

— The Statistical correlation algorithm is often useful for cleaning the dataset as 
it only reduces the time complexity and retains the quality of results. The values 
around 0.5 of the MaxCorr parameter are usually sufficient to drastically decrease 
the time complexity while retaining the same accuracy. Notice that this algorithm 
can easily be used in combination with other algorithms. 

— The VRELIEF and VRELIEF2 algorithms produce worse results in terms of ac- 
curacy when compared to Odds Ratio but they are better at decreasing the time 
complexity. The difference in performance between VRELIEF and VRELIEF2 is 
statistically insignificant. 



4 Post-processing by Rule Subset Selection 

Rule redundancy It is common that many subsets of the set of constructed rules 
cover almost the same training examples. 

Rule overflow The induced classifier is often not understandable since it includes 
too many rules. The user’s upper limit of understanding is a classifier with 10-15 
rules. Depending on the choice of parameters, the original algorithm often created 
classifiers with more than 100 rules (even more than 500). 
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Limited rule coverage The classifier often gives no answer, since no rnle in the 
classifier hres for an instance to be classified. This problem is very serious when 
dealing with highly unbalanced class distributions. 

Majority class domination When class distribution is unbalanced, the rules classi- 
fying to the majority class prevail: there are so many of those, that rules for the 
minority class hardly make it into the classiher. This causes poor accuracy when 
classifying instances of the minority class. 

To deal with the above problems, APRIORI-C performs rule post-processing to 
select a rule subset with a comparable classification accuracy. 

The default Rule Assignment After the algorithm selects the rules to be used in 
a classifier, some unclassified examples will probably remain in the set of training ex- 
amples L. These unclassified examples contribute the majority of wrong classifications. 
Previous research confirmed that the probability of a wrong classification is much lower 
than the probability of the disability to classify an example. The simplest approach is 
to add a rule at the end of the rule list that will unconditionally classify an example 
to the majority class of the unclassified examples. The same approach, solving the 
problem of limited rule coverage, is used in the CN2 algorithm [8]. 

Select N Best Rules (and Select N Best Rules for Each Class) This 
algorithm uses the covering approach for rule subset selection. The number of rules to 
be selected is specified by the user-defined value of parameter N. The algorithm first 
selects the best rule (with highest support), then eliminates all the covered examples, 
sorts the remaining rules and again selects the best rule. This is repeated until the 
number of selected rules reaches the value of the parameter N or until there are no 
more rules to select or until there are no more examples to cover. The algorithm returns 
a sorted list of rules, and adds the default rule. 

select N 
C = {} 

put all rules into T 

put all the examples into L 

sort the rules in T according to their support and conhdence in L 
while I (7 |< A and | L |> 0 and | T |> 0 do 

remove best rule r from T and add it to C 
remove the examples, covered by rule r, from L 
recalculate support and confidence of rules in T 
end while 

build the default rule and add it to C 
output = C 

This algorithm solves the problem of rule redundancy and rule overflow. Clearly, 
it also gives classifiers that execute faster. But it remains vulnerable to the problem of 
unbalanced class distribution. To solve this problem, the algorithm can be modified to 
select N rules for each class (if so many rules exist for each class). In this way the 
rules for the minority class also find their way into the constructed classifier. 

Results Each algorithm was tested with several values of parameter N: 1, 2, 5, 10, 
15 and 20. The main observations, conhrmed by statistical tests, are outlined below: 

- Algorithms USE N BEST RULES and USE N BEST RULES FOR EACH CLASS 
do not differ in accuracy. Only when there are significant differences in class distri- 
butions, the USE N BEST RULES FOR EACH CLASS algorithm performs better. 
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Next, both algorithms increase their accuracy significantly when using the default 
rule. This increase gets smaller with increasing value of parameter N, but it still 
remains noticeable. Finally, both algorithms achieve good results in terms of accu- 
racy and understandability when the value of parameter N reaches 10. Accuracy 
results are comparable to the original algorithm. 

- We have implemented and tested also the CONFIRMATION RULE SUBSET SE- 
LECTION algorithm [9] which also employs a greedy approach, but without re- 
moving the covered examples. The accuracy of this algorithm is always slightly 
lower than the accuracy of the other two algorithms, since this algorithm does not 
build the default rule. 

— When comparing classifiers without the default rule, all algorithms achieve similar 
performance. This is probably due to the good quality of the rule set given as input 
to all the algorithms. 

5 Conclusions 

Using association rules for classification purposes has been addressed also by other 
researchers. Liu et al. [13] address the same problem. Their approach is similar to ours, 
but their algorithm for selecting rules to generate a classifier is more complicated and 
slower. They perform no feature subset selection and no post-processing. Bayardo and 
Agrawal [5,6] address the problem of mining constraint association rules. Implementing 
an efficient search for association rules given item constraints (useful for interactive 
mining) is described by Goethals and Bussche [10]. Their method of pruning is closely 
related to the APRIORI-C method of eliminating irrelevant itemsets. 

This paper presents a new algorithm from the field of association rule learning 
that successfully solves classification problems. The APRIORI-C algorithm is a mod- 
ification of the well-known APRIORI algorithm, making it more suitable for building 
rules for classification, due to the decreased time and space complexity, whereas the 
algorithm still exhaustively searches the rule space. Feature subset selection is best 
done using Odds Ratio, whereas post-processing is best done using the SELECT N 
BEST RULES or the SELECT N BEST RULES FOR EACH CLASS algorithms (for 
N = 10), together with the default rule. 

In UCI domains APRIORI-C achieved results comparable to algorithms that have 
been designed for classification purposes. Given comparable accuracy, the advantage 
of APRIORI-C is that each of its rules represents a reasonable “chunk” of knowledge 
about the problem. Namely, each individual rule is guaranteed to have high support and 
confidence. This is important for knowledge discovery, as shown in the CoIL challenge: 
despite average classification accuracy, the induced rules were evaluated as very useful 
for practice. Most of the practically useful rules (or, better yet, conclusions) reported 
in the CoIL challenge were discovered by APRIORI-C as well. 
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Abstract. The goal of this paper is to further investigate the extreme 
behaviour of the fuzzy clustering proportional membership model (FCPM) 
in contrast to the central tendency of fuzzy c-means (FCM). A data set 
from the field of psychiatry has been used for the experimental study, 
where the cluster prototypes are indeed extreme, expressing the concept 
of ‘ideal type’. While augmenting the original data set with patients 
bearing less severe syndromes, it is shown that the prototypes found 
by FCM are changed towards the more moderate characteristics of the 
data, in contrast with the almost unchanged prototypes found by FCPM, 
highlighting its suitability to model the concept of ‘ideal type’. 

Keywords: Fuzzy clustering; fuzzy model identification; proportional 
membership; ideal type. 



1 Introduction 

In previous work [1], [2], the fuzzy clustering proportional membership model 
(FCPM) was introduced and studied. In contrast to other fuzzy clustering ap- 
proaches, FCPM is based on the assumption that the cluster structure is reflected 
in the data from which it has been constructed. More specifically, the member- 
ship of an entity in a cluster expresses the proportion of the cluster prototype 
reflected in the entity, according to FCPM. One of the phenomena observed in 
an extensive experimental study performed with simulated data, was that the 
cluster prototypes tend to express an extreme rather than average behaviour 
within a cluster [2] . The other approaches such as fuzzy c-means (FCM) tend to 
produce clusters whose prototypes are averaged versions of the relevant entities 
(up to degrees of the relevance) [3]- [5]. 

The goal of this paper is to further investigate this property of FCPM. Having 
a cluster structure revealed in a data set, we question how sensible is this cluster 
structure with regard to augmenting of the data set by entities that bear more or 
less similarity to the cluster prototypes. We take a specific data set [6] from the 
held of psychiatry where the cluster prototypes are indeed extreme to express 
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severe syndromes of mental conditions and we augment this set with patients 
bearing less severe syndromes and explore whether the prototypes are changed 
or not. Yes, they are changed towards the more moderate characteristics with 
FCM. However, there is almost no change under FCPM, which shows that this 
domain may be a natural habitat of the FCPM approach. 

2 The Fuzzy Clusterine; Proportional Membership Model 
(FCPM) 

2.1 The Proportional Membership Model 

In the clustering model, we assume the existence of some prototypes, offered by 
the knowledge domain, that serve as “ideal” patterns to data entities. To relate 
the prototypes to observations, we assume that the observed entities share parts 
of the prototypes. The underlying structure of this model can be described by 
a fuzzy c-partition defined in such a way that the membership of an entity to 
a cluster expresses proportion of the cluster’s prototype reflected in the entity 
{proportional membership). This way, the underlying structure is substantiated 
in the fitting of data to the “ideal” patterns. 

To be more specific, let X = [xkh] denote a n x p entity-to-feature data 
table where each entity, described by p features, is defined by the row-vector 
Xfc = [xkh] G 3?^ (fc = 1 • • • n ; h = 1 ■ ■ - p). Let data matrix X be preprocessed 
into Y = [ykh] by shifting the origin to the gravity center of all the entities 
(rows) in Y and rescaling features (columns) by their ranges. This data set 
can be structured according to a fuzzy c-partition which is a set of c clusters, 
any cluster i (i = 1, • • • , c) being defined by: 1) its prototype, a row-vector 
Vi = [vih] € SffP, and 2) its membership values {uik} {k = 1 • • • n), so that the 
following constraints hold for all i and k: 



^ik — 1 


(la) 


'^ik — 1- 


(lb) 



i=l 



Let us assume that each entity of Y is related to each prototype Vi up to 
its membership degree Uik', that is, Uik expresses the proportion of Vi which is 
present in in such a way that approximately ykh = UikVih for every feature 
h. More formally, we suppose that 



ykh — 'Uik’^ih Y S^ikh^ ( 2 ) 

where the residual values eikh are as small as possible. 

These equations provide a feedback from a cluster structure to the data. 
According to (2), a clustering criterion can be defined to fit each data point 
to each of the prototypes up to the degree of membership. This goal is achieved 
by minimizing all the residual values Sikh via one of the least-squares criteria 
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c n p 

E^{V,V) = Y.Y.Y. '^ikiVkh UikVih) , (3) 

i—1 k—1 h—1 

subject to constraints (la) and (lb), where m = 0, 1, 2 is a pre-specified param- 
eter. 

The equations in (2) along with the least-squares criteria (3) to be minimized 
by unknown parameters U and V = (vi, V 2 , . . . , Vc) G 3?'^^ for Y given, are 
designated as the generic FCPM model. The models corresponding to m = 0, 1, 2 
are denoted as FCPM-0, FCPM-1, and FCPM-2, respectively [2]. 

In FCPM, both, prototypes and memberships, are reflected in the model 
of data generation. The equations (2) can be considered as a device to recon- 
struct the data from the model. The clustering criteria follow the least-squares 
framework to warrant that the reconstruction is as exact as possible. Other 
scalarizations of the idea of minimization of the residuals can be considered as 
well. 

2.2 Ideal Type and FCPM 

Each prototype, Vj, according to (2), is an “ideal” point such that any entity, 
Yk, bears a proportion of it, Uik, up to the residuals. The proportion, Uik, is 
considered as the value of membership of to the cluster i, representing thus, 
the extent entity yk is characterized by the corresponding “ideal” point v^. This 
allows us to consider this model as relevant to the concept of ideal type in logics. 
M. Weber [1864-1920] defines that ‘The ideal type is such a combination of 
characteristics that no real entity can satisfy all of them, though the entities can 
be compared by their proximity to the ideal type’ (quoted from [6]). An ideal 
type represents a concept self-contained and entirely separated from all the other 
(ideal) types. 

As our simulation experiments have shown, the model indeed tends to reveal 
somewhat extreme points as the cluster prototypes, especially when FCPM-2 is 
employed [2]. This will be further explored in the remainder. 

2.3 FCPM and FCM 

The criteria (3) can be expressed in terms of the distances between observed 
entities y^ and cluster prototypes as follows: 

n c 

Em{U,V) = '^'^u^d{yk,u^k^i) (4) 

k—1 i—1 

where c?(a, b) is the Euclidean distance (squared) between p-dimensional vectors 
a and b. 

This shows that these criteria are parallel, to an extent, to those minimized 
in the popular fuzzy c-means method (FCM) [3]: 

n c 

B^{U,V) = ^^u(^fi(yfe,v,). 

k—1 i—1 



( 5 ) 
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The major difference is that FCPM takes the distances between entities 
and their ‘projections’ on the axes joining 0 and cluster prototypes while FCM 
takes the distances between and prototypes themselves. This leads to the 
following differences in respective algorithms and solutions: 

1. The alternating optimization algorithm is straightforward in FCM because 
its criterion (5) very well separates the sets of variables related to centroids 
and to membership. On the contrary, the alternating optimization in FCPM 
requires some work when optimizing the membership, because it is not sep- 
arated from centroids in (4). 

2. FCM gives poor results when m = 0 or m = 1 [3], while these values of 
m are admissible in FCPM and lead to interesting results [2]. In particular, 
in FCPM-0 each entity is considered as a part of each prototype, leading 
to such effects as automatically adjusting the number of clusters to smaller 
values. On the other hand, in FCPM-1 and FCPM-2 larger proportions Uik 
have larger weights over the smaller ones, thus overcoming the rigidity of 
FCPM-0. We will stick to the most contrasting of these models, FCPM-2, in 
this paper. 

3. FCM tends to produce prototypes as averaged versions of their clusters, 
while FCPM would tend keep prototypes outside of the data cloud. 

4. In FCPM, the following phenomenon occurs which has no analogues in FCM. 
Given centroids v^, minimizing (4) over Uik would tend to minimize distances 
d{yk,Uikyi) by projecting y^ onto axes joining 0 and thus leading to 
Uik'^i being these projections. This would require a careful choice of the 
location of the origin of the space in FCPM, which is irrelevant in FCM. 
In particular, putting the origin into the gravity center (grand mean) point 
would make all potential extreme prototypes much differing from each other 
in the framework of FCPM. Figure 1 illustrates this aspect geometrically. 
Consider two data points, yi and y 2 , and a prototype, Vi, in the original 
reference frame (Figure lA). It is apparent from the figure that the projection 
of each data point onto prototype Vi, unVi and M 12 V 1 correspond to close 
values. Let us consider the reference frame centered on the mean of the 
data, y, instead (Figure IB). Now, the prototype is v(, and the projections, 
rtiiv^ and mi 2 v(, of each data point, become much more distinct from each 
other. A similar construction may be devised for prototype V 2 and in 
each reference frame. Moreover, if the original reference frame is used, one 
may question the utility of having two prototypes (vi and V 2 ), whereas in 
the transformed reference frame the corresponding prototypes (vj and V 2 ) 
become very much distinct. 



2.4 FCPM Algorithm 

An FCM like alternating optimization (AO) algorithm has been developed in 
order to minimize the FCPM criteria [2] . Each iteration of the algorithm consists 
of two steps as follows. 
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Fig. 1. How the distinction between prototypes Vi and corresponding projections, 
UifcVi, of data points yk are affected by the choice of the origin of the space in the 
framework of FCPM model. A. Data points and prototypes in the original reference 
frame; B. Same data points and prototypes in the transformed reference frame, where 
its relative position to the original reference frame is depicted in A. 



First, given a prototype matrix V, the optimal membership values are found 
by minimizing one of the criteria (3). In contrast to FCM, minimization of 
each criteria subject to constraints (la) and (lb) is not an obvious task; it 
requires an iterative solution on its own. The gradient projection method [8], [9] 
has been selected for finding optimal membership values (given the prototypes) . 
It can be proven that the method converges fast for FCPM-0 with a constant 
(anti) gradient stepsize. 

Let us denote the set of membership vectors satisfying conditions (la) and 
(lb) by Q. The calculations of the membership vectors = 



^ik 



are based 



on vectors 



] (t) 




d 



(i) 

ik 



(t-1) 

ik 






(m + 2) {vi,Vi)u 



(m+l) 

ik 



2{m+l) {vi,yk)u^^ + m{yk,yk)u^^^ \ 



( 6 ) 



where am is a stepsize parameter of the gradient method. Then, is to be 
taken as the projection of in Q, denoted by Pg(d[,*^)^. The process stops 

^ The projection PQ(d^*^) is based on an algorithm we developed for projecting a 
vector over the simplex of membership vectors its description is omitted 
here. 
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when the condition < e is fulfilled, with an appropriate 

matrix norm. 

Second, given a membership matrix U, the optimal prototypes are determined 
according to the first-degree optimum conditions as 



(t) 

^ih 



EL. (> 






m+1 



Ukh 



ELi 



.{t)\ 
'ik J 



m+2 ’ 



(7) 



with m = 0, 1,2. Thus, the algorithm consists of “major” iterations of updating 
matrices U and V and “minor” iterations of recalculation of membership values 
in the gradient projection method within each of the “major” iterations. The 
algorithm starts with a set of c arbitrarily specified prototype points in 

and it stops when the difference between successive prototype matrices 
becomes small or a maximum number of iterations t\ max is reached (in our 
experiments ti max = 100). 

The algorithm converges only locally (for FCPM-1 and FCPM-2). Moreover, 
with a “wrong” number of clusters pre-specified, FCPM-0 may not converge at 
all due to its peculiar behaviour: it may shift some prototypes to infinity [2]. 



3 Experimental Study 



3.1 Contribution Weights 



The concept of ‘importance weight’ of a variable is useful in cluster analysis. 
In particular the following relative contributions of a variable h to & cluster z, 
w{h\i) and to the overall data scatter, w{h), will be considered in this study [6], 

[7]: 



w{h\i) = 



}{h) = 






■^ih 



h 

c 

E 



2 = 1 



'^vlh 



(8) 



(9) 



k.h 



where Vih denotes the gravity center of (hard) cluster z, i.e. Vih = Vkh/ni, 

with Ci = {k : Utk = 1}, and rii = |Cj| , the number of entities belonging to 
cluster z. Note that the farther Vih is from zero (which, due to our data stan- 
dardization is the grand mean), the easier is to separate that cluster from the 
other ones in terms of variable h, which is reflected in the weight values. Due to 
this, w{h\i) might be viewed as a measure of the “degree of interestingness” of 
variable h in cluster z with regard to its “standard” mean value. 

Let us define the ‘most contributing’ features within a cluster and to the 
overall data structure, as the ones greater than the averages of (8) and (9), 
^ (^1*) /P W Ip respectively, which may act as frontier values. 
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Table 1. Relative contributions of the 17 psychosomatic variables (/ 11 -/ 117 ) to classes 
D, M , Ss and Sp', and to the data scatter according to the original partition structure. 
The features values higher than corresponding averages are marked. 





D 


M 


Ss 


Sp 


General 


h 


w{h\D) 


w{h\M) 


w{h\Ss) 


w{h\Sp) 


w{h) 


hi 


4.27 
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0.22 


0.04 


2.07 


/l 2 
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1.67 
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2.19 


2.02 
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477T7I 
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0.03 
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hs 
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1.00 
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2.28 
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3.44 


1.39 


4.28 
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3.43 
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1.76 


0.53 


2.79 
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3.97 


hi2 


1.76 


0.71 


1.00 


427171 


3.02 


hi3 


477181 


16.261 


0.13 


77f2 


7710 


hiA 


1.36 


0.46 


3.01 


15.3,51 


1.92 


his 


0.59 


0.49 


0.70 


16.4,51 


1.54 


his 


0.00 


187191 


1277121 


1.27 


TUT 


hir 


[5T44] 


4T740 


4T7T4I 


2.00 


777 


w 


5.00 


4.89 


4.50 


4.60 


4.01 



3.2 Fuzzy Clustering of Mental Disorders Data 

A mental disorders data set [6] was chosen, consisting of 44 patients, described 
by seventeen psychosomatic features (hi-hn). The features are measured on a 
severity rating scale taking values of 0 to 6. The patients are partitioned into four 
classes of mental disorders: depressed {D), manic (M), simple schizophrenic {Ss) 
and paranoid schizophrenic {Sp). Each class contains eleven consecutive entities 
that are considered ‘archetypal psychiatric patients’ of that class. 

The mental disorders data set shows several interesting properties. First, 
there is always a pattern of features (a subset of hi-hn) that take extreme 
values (either 0 or 6) and clearly distinguish each class. Better still, some of 
these features take opposite values among distinct classes. However, some feature 
values are shared by classes leading to overlaps. Given these characteristics, each 
disease is characterized by ‘archetypal patients’ that show a pattern of extreme 
psychosomatic feature values defining a syndrome. 

Table 1 presents the relative weights, w{h\i) (defined by (8)), of features 
hi-hn to classes D, M, Ss and Sp and to the data scatter, w{h) (9), according 
to the original partition structure. The corresponding averages, w, are listed at 
the bottom row. All values are in percent, and the values corresponding to the 
‘most contributing ’ features are marked. 
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The algorithms FCPM-2 and FCM (with its parameter m = 2) have been 
run starting from the same initial points, setting the number of clusters to 
four (i.e. c = 4). Table 2 shows the prototypes found by FCM and FCPM-2, 
in the original data scale, where the marked values belong to the set of the 
most contributing weight features within a cluster. Comparing the results, it 
is clear that the feature-values of FCPM-2 prototypes are more extreme than 
corresponding ones obtained by FCM (which is in accordance with our previous 
studies [2]). Moreover, in the prototypes found by FCPM-2 some of the feature 
values move outside the feature scale. In fact, all these ‘extreme’ values belong 
to the set of features that contribute the most to the whole partition or, at 
least, to a cluster (Table 1). In particular, these features present the highest 
discriminating power to separate the corresponding cluster from the remaining 
ones. 

This property of FCPM prototypes goes in line with suggestions in data 
mining that the more deviant a feature is from a standard (the grand mean, in 
this case), the more interesting it is. 

Concerning the membership values found, both algorithms assign the highest 
belongingness of an entity to its original class, correctly clustering all entities to 
the corresponding class^. 



3.3 Clustering of Augmented Mental Disorders Data 

In order to study FCPM-2 behaviour on revealing extreme prototypes against 
FCM based central prototypes, the original data set should be modified to get 
in less expressed cases. To achieve that, each class was augmented with six mid- 
scale patients and three light-scale patients. Each new patient, Xg = [xgh], was 
generated from a randomly selected original patient, xj, = [xkh], applying the 
transformation: Xgh = round {sp ■ Xkh) + t (for all ft- = /ii, • • • , ftir), with scale- 
factor sp = 0.6 to obtain a mid-scale patient and sp = 0.3 to obtain a light-scale 
patient. The shift parameter t is a random number between 0 or 1. 

Table 3 shows the prototypes (v) found in the original data set followed by 
corresponding ones (v') for the augmented data, for each algorithm, where the 
values of features with the highest weights are also marked. 

Most of FCM prototypes extreme feature values (in v’s) move towards in- 
termediate feature values (in v ’s), showing FCM tendency to reveal central 
prototypes. Contrastingly, in FCPM-2 prototypes, the most weighting features 
maintain their full-scale (i.e. extreme) values, reinforcing the ‘extreme’ nature 
of FCPM-2 prototypes, and consequently, their sensitivity to the most discrim- 
inating features, despite the presence of new mid- and light-scale patients. 

^ The only exception occurs for entity (21) from class M , which is assigned to class Sp; 
the same phenomenon is reported in [6] for other clustering algorithms as complete 
linkage, A-means and separate-and-conquer. 
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Table 2. Cluster prototypes ( , vm, vgp), revealed by FCM and FCPM-2, 

presented in the original space. Values corresponding to the most contributing weight 
features are marked. 




4 Conclusion 

Modelling the concept of ‘ideal type’ through the notion of ‘proportional mem- 
bership’ appears to be a promising aspect to be explored in the FCPM model. 

The results of the experimental study clearly outline the extreme behaviour 
of FCPM prototypes in contrast with the central tendency of FCM prototypes, 
particularly when the data set was augmented with new patients more or less 
similar to the cluster prototypes. Since the extreme feature values of FCPM 
correspond to the most contributing weights, the proportional membership has 
a discriminating power to separate clusters from each other. Moreover, this ex- 
treme behaviour appears to be compatible with the notion of “interestingness” 
in data mining, since it is sensitive to the feature-values that are farthest from 
the average. 

In terms of applications, the concept of ‘ideal type’ fits to such domain areas 
as psychiatry or market research, where a prototype (such as mental disturbance 
or consumer profile, in the examples given) is well characterized by extreme 
conditions. Modeling such a concept through fuzzy clustering seems appealing 
and useful as a part of the decision making process in those application areas. 

The extension of this preliminary experimental study using more vast data 
sets is fundamental to firmly establish the found results and better outline the 
model properties and its relevance in the application to several domain areas. 
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Table 3. Prototypes found by FCM and FCPM-2, in the original data set (v’s) and in 
the augmented data set (w^’s), characterizing each mental disorder: B, M, Ss and Sp. 
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Abstract. The most common (or even only) choice of activation func- 
tions (AFs) for multi-layer perceptrons (MLPs) widely used in research, 
engineering and business is the logistic function. Among the reasons for 
this popularity are its boundedness in the unit interval, the function’s 
and its derivative’s fast computability, and a number of amenable math- 
ematical properties in the realm of approximation theory. However, con- 
sidering the huge variety of problem domains MLPs are applied in, it 
is intriguing to suspect that specific problems call for specific activation 
functions. Biological neural networks with their enormous variety of neu- 
rons mastering a set of complex tasks may be considered to motivate this 
hypothesis. We present a number of experiments evolving structure and 
activation functions of generalized multi-layer perceptrons (GMLPs) us- 
ing the parallel netGEN system to train the evolved architectures. For the 
evolution of activation functions we employ cubic splines and compare 
the evolved cubic spline ANNs with evolved sigmoid ANNs on synthetic 
classification problems which allow conclusions w.r.t. the shape of de- 
cision boundaries. Also, an interesting observation concerning Minsky’s 
Paradox is reported. 



1 Introduction 

While many researchers have investigated a variety of methods to improve Arti- 
ficial Neural Network (ANN) performance by optimizing training methods, learn 
parameters, or network structure, comparably few work has been done towards 
using activation functions other than the logistic function. It has been shown 
that a two-hidden-layer MLP with sigmoidal activation function can implement 
arbitrary convex decision boundaries [1]. Moreover, such an MLP is capable of 
forming an arbitrarily close approximation to any continous nonlinear mapping 
[2]. However, common to these theorems is that the number of neurons in the 
layers is at best bounded, but can be extremely large for practical purposes. E.g., 
for a one-hidden-layer MLP Barron (1993) derived a proportionality of the sum 
square error (SSE) to the number of neurons according to SSE ~ with T 
being the number of neurons (for target functions having a Fourier representa- 
tion) [3]. 
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If we consider the simple example of a rectangular function, it becomes obvi- 
ous that a rectangular activation function can approximate this target function 
much easier than a sigmoidal function. 



1.1 Related Work 

Liu and Yao (1996) evolved the structure of Generalized Neural Networks (GNN) 
with two different activation function types, namely, sigmoid and Gaussian basis 
function. Experiments with the Heart Disease data set from the UGI machine 
learning benchmark repository revealed small differences in classification error 
slightly favoring GNNs over ANNs solely using sigmoid or Gaussian basis acti- 
vation function [4]. 

Vecci et al. (1998) introduced the Adaptive Spline Neural Network (ASNN), 
where activation functions are described by Catmull-Rom cubic splines. ANN er- 
ror (including a regularization term) then is not only dependent on the weights, 
but also on the spline parameters which together with the weights are adapated 
by backpropagation learning. It has been reported that ASNNs yield a per- 
formance comparable to networks using the logistic AF, while using a smaller 
number of hidden neurons than the conventional network [5] . 

Sopena et al. (1999) presented a number of experiments (with widely-used 
benchmark problems) showing that multilayer feed-forward networks with a sine 
activation function learn two orders of magnitude faster while generalization 
capacity increases (compared to ANNs with logistic activation function) [6] . 

2 Evolution of ANN Structure and Activation Functions 

The technical platform for the evolution of GMLPs is the netGEN system search- 
ing a problem-adapted ANN architecture by means of an Evolutionary Algorithm 
(EA), and training the evolved networks using the Stuttgart Neural Network Sim- 
ulator (SNNS) [7]. In order to speed up the evolutionary process, an arbitrary 
number of workstations may be employed to train individual ANNs in parallel. 

The Ann’s genetic blueprint is based on a direct encoding suggested by 
Miller et al. [8]. The basic structure of the genotype is depicted in Figure 1. 



Leam 


Neuron 


Structure 


Parameters 


Parameters 


Parameters 



Fig. 1. The organization of the ANN genotype. 



The binary encoded GMLP chromosome contains three main sections. The 
Learn Parameters encode values used for ANN training, the Neuron Parameters 
are used to describe the activation function of each neuron, and the Structure 
Parameters explicitely specify each connection of the network (direct encoding) . 
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Fig. 2. Details of the learn parameter section (above) and the neuron parameter 
section (below) of the ANN genotype. 



Figure 2 shows the details of the learn parameter and the neuron parameter 
section. 

The learn parameter section contains the number of epochs and two learn- 
ing parameters for the selected training algorithm Resilient Back-propagation 
(RProp) [9] . The evolution of the number of epochs for ANN training is a mea- 
sure to avoid overfitting with the additional benefit of another critical ANN 
parameter being discovered by evolution. 

The most interesting feature of the neuron parameter section are Markers - 
single bases (bits) which are a simple analogue to activators/repressors regulating 
the expression of wild-type genes. A hidden neuron marker determines, if the 
specific neuron associated with it is present in the decoded network. As the 
number of output neurons is usually strictly dependent on the specific problem 
the ANN should learn, these neurons are not controlled by markers. 

The activation function for each neuron is composed by a maximum number 
of n control points of a cubic spline. In order to also allow fewer control points, 
each control point is “regulated” by a point marker. It should be noted that, 
if a neuron marker indicates the absence (or pruning) of a particular neuron, 
all control points for the cubic spline activation function and all connections 
associated with that neuron become obsolete. As a consequence, most ANN 
chromosomes contain noncoding regions. When using the logistic AF, only the 
hidden neuron markers are contained in the neuron parameter section. 

ANN structure is represented by a linearized binary adjacency matrix. As in 
this work we are only concerned with feed-forward architectures, all elements of 
the upper triangle matrix must be 0, hence they are not included in the ANN 
genotype. Due to this linearization we are able to use the standard 2-point 
crossover operator for recombination. Technically, the hidden neuron markers 
are stored in the main diagonal of the connection matrix constructed from the 
genotype’s structure section as shown in Fig. 3 for an exemplary network. 

The maximum number of hidden neurons (neuron markers) has to be set in 
advance with this encoding scheme, hence, it could be labeled as Evolutionary 
Pruning, since the system imposes an upper bound on the complexity of the 
network. 
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Fig. 3. Genotype/Phenotype mapping of ANN structure. 



2.1 ANN Fitness Function 

The ANN fitness function comprises a complexity regularization term £c = 
\Ctotai\, with C total being the set of all network connections. The Composite 
Fitness Function T is given by a weighted sum of Model Fitness and Complexity 
Fitness 



J- = a\ 



1 

1 + £m 



+ 02 



1 

\ + £c 



( 1 ) 



with cti + «2 = 1-0, and £m being the model error. Earlier work showed that 
02 in the range of 0.001 — 0.01 is sufficient to guide the evolution towards ANNs 
of low complexity. In effect the complexity term acts as a ’’tie-breaker” between 
different ANNs with (nearly) identical model error, where the less complex net- 
work receives a slightly better fitness. In this work we set a\ = 0.99. 



2.2 EA and ANN Parameters 

The following EA and ANN parameters have been used with all the experiments 
in this paper: 

EA Parameters: Population Size = 50, Generations = 50, Crossover Probabil- 
ity Pc = 0.6, Mutation Probability Pm = 0.005, Crossover = 2-Point, Selection 
Method = Binary Tournament. 

ANN Parameters: Network Topology = Generalized Multi-Layer Perceptron, 
Activation Function (hidden and output neurons) = Sigmoid or evolved Cubic 
Splines, Output Function (all neurons) = Identity, Training = RProp, Learning 
Parameters Z\q (max I.O), Amax (max 50.0), Number of Training Epochs (max; 
1000) = Evolutionary. 
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2.3 Cubic Spline Parameters 

A cubic spline activation function is described by n control points (xi,yi) G K 
where i = 1, ... ,n. These n points define n — 1 intervalls on the x-axis denoted 
by X € [xi,Xj+i] with x^+i > x^. In each of these intervalls a function fi{x) is 
defined by 



fi{x) = fi(Xt) + Oi(x - Xi) + bi{x - XiY + Ci(x - Xi)^ (2) 

with constants ai,bi,Ci G K. Demanding equality of the function value, and 
the first and second derivative at the interval borders the constants can be 
determined for each interval yielding a continous and differentiable function 
composed of a number of cubic splines. 

With respect to the ANN genotype (Fig. 2) the maximum number of con- 
trol points Uc has to be set prior to the evolutionary run. Also, the x-, and 
y-range of the cubic spline activation function has to be fixed to specific in- 
tervals [xminjXmax] (sensitivity interval) and [amin,amax] (activation interval), 
respectively. However, within these bounds the evolutionary process may freely 
fioat. 

For all experiments we set the number of control points rZc = 8 and the 
activation interval [amin,(imax] = [0.0, 1.0]. The sensitivity interval depends on 
the specific problem. 

3 Experimental Setup 

The experiments have been devised in order to answer two main questions. The 
first one is simply, if the rather complex organization of the ANN genotype 
containing genetic information on the neurons’ activation function allows the 
evolutionary process to find any valid and reasonable networks. Thus, like cohorts 
of ANN researchers, we start out with the very simple but still interesting XOR 
problem. 

The second question to be adressed is the performance of a cubic spline 
network in comparison to a GMLP with conventional, sigmoidal AFs. Intuitively, 
the decision boundaries in classification problems could be modeled more easily 
by cubic splines of great plasticity than by the “frozen” logistic function. Thus, 
we choose the simple continous XOR problem with piece-wise linear decision 
boundaries, and the synthetic, but nevertheless hard Two Spirals problem with 
complex, nonlinear decision boundaries. 

During the evolutionary process the ANN structure is trained on a training 
set and evaluated on a validation set (ANN fitness). Finally, the performance of 
the evolved best net is measured on a test set. Each evolutionary run is repeated 
20 times. 

For the XOR problem we use a network with two input and one output 
neuron. The binary output value 1 is defined by an activation of the output 
neuron Oo > 0.5, otherwise the output is mapped to 0. All pattern sets contain 
the only four example patterns. 
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The continous XOR (cXOR) problem is defined by a division of the unit 
square into four squares of identical area being separated by the lines x = 0.5 
and y = 0.5. The lower left and the upper right square are labeled with bi- 
nary value 1, the two remaining squares represent the class with binary value 0. 
The corresponding ANN has an I/O-representation identical to the simple XOR 
problem. The training set is composed of 100 randomly selected points in the 
unit square, the validation set contains 200 random patterns, and the test set is 
made of 10,000 equally spaced points covering the complete unit square. 

The task in the Two Spirals problem is to discriminate between two sets of 
points which lie on two distinct spirals in the plane. These spirals coil three 
times around the origin and around one another [10]. Training, validation, and 
test set comprise 194 patterns each. The basic ANN structure is defined by two 
input neurons and two output neurons (one for each spiral, winner takes all). 

For these classification problems the model error in the fitness function is 
simply given by £m = — , where e„ is the number of misclassifications on the 
validation set, and n„ its respective size. 



4 Experimental Results 

For the XOR problem we only evolved networks with cubic spline AFs. We set 
the maximum number of hidden neurons to three and the sensitivity interval of 
the neurons’ cubic spline AF to [—10.0, 10.0]. In order to enable a more detailed 
fitness assignment we used £m = MSE for the model error (Equ. 1). After 20 
runs the average MSE of the single best nets of each run was 6.45 x 10“®. Four 
of the best nets had a single hidden neuron, while the other 16 ANNs simply 
consisted of the two linear inputs and the output neuron with a cubic spline AF. 
These two simple structures are depicted in Figure 4. 




Fig. 4. Only two evolved XOR structures (white/input neuron, black/neuron with 
cubic spline AF). 



The details of two evolved XOR nets are shown in Fig. 5. 

When performing the calculations for the four possible input patterns, it can 
be seen that only a small portion of the sensitivity interval is used. Specifically, 
the big peak in each AF exceeding the activation interval seems to be useless, 
however, it could be argued that the steep slope provided by these peaks is 
profound guidance for the training algorithm. 

The most interesting property of these networks is that they seem to contra- 
dict the well-known fact that a single perceptron cannot learn the XOR function 
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Fig. 5. Evolved XOR nets with weights, bias, and cubic spline activation function. 



(Minsky’s Paradox [11]). In a strict sense a perceptron is defined having a thresh- 
old AF, but even a logistic AF does not resolve Minsky’s Paradox. While these 
functions have the desirable non-linearity, they are monotonous in contrast to 
the non-monotonous evolved cubic spline AFs. In all the reported aftermath of 
Minsky’s work, it seems ironic that a simple triangle activation function would 
have resolved the paradox. Recently, Rueda and Oommen (2000) reported on 
this observation in the context of statistical pattern recognition proposing pair- 
wise linear classifiers [12]. A non-monotonous triangle AF being a simplification 
of the sinusoidally shaped parts of the cubic spline AFs in Fig. 5 transforms the 
ANN into a pairwise linear classifier. 

In Table 1 the results for the continous XOR (cXOR) problem are presented. 



Table 1. cXOR - Structure parameters, classification accuracy (Acc) with standard 
deviation (StdDev) and overall best performance of evolved ANNs (averaged on 20 
runs). 



1 cXOR 1 


AF 


Structure 


Validation Set 


Test Set | 




Hidden 


Connections 


Acc 


StdDev 


Acc 


StdDev 


Best 


Sigmoid 


2.80 


10.70 


0.9878 


0.0053 


0.9679 


0.0065 


0.9752 


Spline 


2.50 


9.70 


0.9780 


0.0092 


0.9583 


0.0132 


0.9767 



Again, we set the maximum number of hidden neurons to three and the 
sensitivity interval of the cubic splines to [—10.0, 10.0]. The results slightly favor 
the sigmoid networks, however, the best overall ANN found is a cubic spline net 
(Fig. 6), even though the search space of the latter is much larger. While the 
genotype for the network with logistic AF consists of 39 bases, the cubic spline 
net chromosome comprises 538 bases. 
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Fig. 6. cXOR - Overall best ANN with cubic spline AFs. 



In Fig. 7 the output neuron value for the complete unit square by the best 
sigmoidal and the best cubic spline ANN is shown. 




Fig. 7 . cXOR - Output neuron activation for unit square of the best evolved sigmoid 
ANN (left) and the best evolved spline ANN(right) (darker = lower value). 



Although, the classification results of both nets are nearly identical, it can be 
seen that the spline net has found a very sharp and accurate decision boundary 
for X = 0.5, and a more fuzzy boundary at y = 0.5. The sigmoid net exhibits 
uniform activations (due to monotonicity of the AF), but has slight problems to 
accurately model the vertical and horizontal decision boundaries. 

The results of GMLPs evolved for the Two Spirals problem are presented in 
Table 2. 

The upper bound of hidden neurons has been set to 16, and the sensitivity 
interval of cubic splines to [—100.0, 100.0]. In addition to the evolutionary runs 
we present results (20 different initial sets of random weights) of a standard 
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Table 2. TwoSpirals - Structure parameters, classification accuracy (Acc) with stan- 
dard deviation (StdDev) and overall best performance of evolved and standard (not 
evolved) ANNs (averaged on 20 runs). 



TwoSpirals 


AF 


Structure 


Validation Set 


Test Set 




Hidden 


Connections 


Acc 


StdDev 


Acc 


StdDev 


Best 


Standard 


16.00 


64.00 


0.6173 


0.0281 


0.6150 


0.0303 


0.6701 


Sigmoid 


14.90 


130.70 


0.8541 


0.0273 


0.8379 


0.0327 


0.8918 


Spline 


10.15 


58.20 


0.7090 


0.0385 


0.6791 


0.0502 


0.7887 



sigmoid network with one hidden layer (16 neurons) for which the evolved learn 
parameters of the best evolved sigmoid net have been used. 

Here the performance of the evolved sigmoidal nets is clearly superior to the 
spline networks, but the difference in search space size (234 bases sigmoid, 2682 
bases spline) might be the main reason for these results. In order to gather some 
evidence for this explanation, we started a single run evolving spline nets for 500 
generations (instead of 50) and found a network (9 hidden, 36 connections) with 
an accuracy of 0.8711 and 0.8969 on validation and test set, respectively. As this 
single run took approx. 60 hours on 22 Linux PCs (200-800MHz Pentiums), we 
are working on a substantial speedup of the evolutionary process by using (very) 
early stopping techniques for ANN training. 

In Fig. 8 the classification of the unit square by the best evolved ANNs (Table 
2) is depicted. 




Fig. 8. TwoSpirals - Unit square classification by best evolved sigmoid ANN (left) and 
best evolved spline ANN (right), and the test set (middle). 



Both evolved ANNs generate non-contigous subspaces for the two classes 
which make misclassifications inevitable. Though, the spline net has lower clas- 
sification accuracy, the decision boundaries are of a shape closer resembling the 
spirals than those of the sigmoid net. However, the spirals are only defined in 
the areas shown in the middle figure (Fig. 8), and these areas are classified more 
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accurately by the sigmoid net. A specific source of error for the spline net is 
located in the upper right quadrant, where a black area is inside the inmost 
grey ring (falsely) covering a part of the grey spiral. As indicated above, further 
evolution of the complex cubic spline genes could eliminate such artefacts. 

5 Summary 

We have reported on experiments evolving structure and activation functions of 
generalized multi-layer perceptrons. We compared evolution of ANNs with the 
widely used logistic activation function in each hidden and output neuron to 
networks with cubic spline activation functions. Each neuron in the spline net 
can have a different nonlinear, non-monotonous activation function of (nearly) 
arbitrary shape. The non-monotonicity of these activation functions led the way 
to solve the XOR problem with only a single neuron which is impossible by using 
monotonous activation functions. 

Further experiments with the more complex two spirals problem favored the 
evolved sigmoid nets. However, the one order of magnitude larger genotype of 
the spline networks could be the main reason for this result. By inspecting the 
decision boundaries of the two types of networks the potential of the spline nets 
to model linear and circular boundaries of class subspaces with great accuracy 
have been demonstrated. 

Also, with the two spirals problem the number of hidden neurons and con- 
nections of a spline network was found to be considerably smaller than that of a 
sigmoid network. This is in accordance to reports on the adaptive spline neural 
networks presented in [5]. 

Continuing work will explore the use of cubic splines with improved locality, 
as a single mutation in a cubic spline gene can dramatically change the shape of 
the whole activation function. More advanced classes of cubic splines avoid this 
behavior, and a change of a single control point will result in only a local change 
of the cubic spline in the neighborhood of this point. Moreover, techniques to 
decrease the training time of ANNs are currently investigated so as to decrease 
the running time of network evolution employing the parallel netGEN system. 
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Abstract. This paper describes a statistics-based approach for cluster- 
ing documents and for extracting cluster topics. Relevant Expressions 
(REs) are extracted from corpora and used as clustering base features. 
These features are transformed and then by using an approach based on 
Principal Components Analysis, a small set of document classification 
features is obtained. The best number of clusters is found by Model- 
Based Clustering Analysis. Data transformations to approximate to nor- 
mal distribution are done and results are discussed. The most important 
REs are extracted from each cluster and taken as cluster topics. 



1 Introduction 

We aimed at developing an approach for automatically separating documents 
from multilingual corpora into clusters. We required no prior knowledge about 
document subject matters or language and no morphosyntactic information was 
used, since we wanted a language independent system. Moreover, we also wanted 
to extract the main topics characterizing the documents in each cluster. 

Clustering software usually needs a matrix of objects characterized by a set 
of features. In order to obtain those features, we used LocalMaxs algorithm to 
automatically extract REs from corpora [1]. These REs, such as Common agri- 
culture policy, Common Customs, etc., provide important information about the 
content of the texts. Therefore we decided to use them as base features for clus- 
tering. The most informative REs correspond to cluster topics, as we will show. 
This paper is organized as follows: features extraction is explained in section 
2; data transformations to approximate to normality, clustering and summariza- 
tion in sections 3 and 4; section 5 presents and discusses results obtained; related 
work and conclusions are drawn in section 6 and 7. 
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2 Extracting Multiword Features from Corpora 

We used LocalMaxs algorithm, Symmetric Conditional Probability (SCP) statis- 
tical measure and Fair Dispersion Point Normalization (FDPN) to extract REs 
from any corpus. A full explanation of these tools is given in [1]. However, a brief 
description is presented here. Thus, let us consider that an n-gram is a string of 
words in any text^. For example the word rational is an 1-gram; the string ratio- 
nal use of energy is a 4-gram. LocalMaxs is based on the idea that each n-gram 
has a kind of cohesion sticking the words together within the n-gram. Different 
n-grams usually have different cohesion values. One can intuitively accept that 
there is a strong cohesion within the n-gram [Bill, Clinton) i.e. between the 
words Bill and Clinton. However, one cannot say that there is a strong cohesion 
within the n-gram (or, if) or within the [of two). Thus, the SCP{.) cohesion 
value of a generic bigram (x, y) is obtained by 

SCP{{x, y)) = p{x\y).p{y\x) = (1) 

where p{x,y), p{x) and p{y) are the probabilities of occurrence of bigram [x,y) 
and unigrams x and y in the corpus; p{x\y) stands for the conditional probability 
of occurrence of x in the first (left) position of a bigram, given that y appears in 
the second (right) position of the same bigram. Similarly p{y\x) stands for the 
probability of occurrence of y in the second (right) position of a bigram, given 
that X appears in the first (left) position of the same bigram. 

However, in order to measure the cohesion value of each n-gram of any size 
in the corpus, FDPN concept is applied to SCP{.) measure and a new cohesion 
measure, SCP-f{.), is obtained. 



SCPJ{{wi...Wr^)) 



pjjwi . . . Wn))^ 

P('^l • ■ • Wi)-P{w^+i ...Wn) 



(2) 



where p{{wi . . . Wn)) is the probability of the n-gram rci . . . ui„ in the corpus. So, 
any n-gram of any length is “transformed” in a pseudo-bigram^ that reflects the 
average eohesion between each two adjacent contiguous sub- n-gram of the orig- 
inal n-gram. Then, LocalMaxs algorithm elects every n-gram whose SCP-f{.)^ 
cohesion value is a salient one (a local maximum), (see [1] for details). See some 
examples of elected n-grams, i.e., REs: Human Rights, Human Rights in East 
Timor, politique energetique and economia energetica. 

^ We use the notation (wi . . . w„) or wi . . . w„ to refer an n-gram of length n. 

^ Roughly we can say that known statistical cohesion / association measures such as 
Dice{.), MI{.), etc. seem to be “tailored” to measure just 2-grams. However, 
by applying FDPN to those measures, it is possible to use them for measuring the 
cohesion values of n-grams for any value of n [1]. 

® LocalMaxs has been used in other applications with other statistical measures, as it 
is shown in [1]. However, for Information Retrieval (IR) purposes, very interesting 
results were obtained by using SCP-f{.), in comparison with other measures [2]. 
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2.1 The Number of Features 

Document clustering usually requires a matrix of documents characterized by 
the smallest possible set of variables. That matrix is then conveyed to clustering 
software. So, a multilingual 872,795 words corpus with 324 documents^ was used 
in order to test our approach. LocalMaxs extracted 25,838 REs from that corpus. 
Obviously, we cannot use such a high number of features for distinguishing such 
a small number of objects (324 documents). However, these REs (base features) 
provide the basis for building a new and reduced set of features. 



2.2 Reducing the Number of Features 

Let us take the following extracted REs containing the word agricultural: agri- 
cultural products, processing of agricultural products, agricultural products receiv- 
ing refunds and common agricultural policy. For document clustering purposes, 
there is redundancy in these REs, since, for example, whenever processing of 
agricultural products is in a document, agricultural products is also in the same 
document and it may happen that, common agricultural policy is also in that 
document. These redundancies can be used to reduce the number of features. 

Thus, according to Principal Components Analysis (PCA), often the origi- 
nal TO correlated random variables (features) Xi , , . . . , X^a can be “replaced” 

by a subset Yi,Y2, . . . ,Yk of the to new uncorrelated variables (components) 
Yi, 12 , • ■ • , Ym, each one being a linear combination of the to original variables, 
i.e., those k principal components provide most of the information of the original 
TO variables [5, pages 340-350]. The original data set, consisting of I measure- 
ments of TO variables, is reduced to another one consisting of I measurements of 
k principal components. Principal components depend solely on the covariance 
matrix S (or the correlation matrix p) of the original variables Xi, . . . , Xm- Now 
we state REi , RE2 , ■ ■ ■ , REp as being the original p variables (REs) of the sub- 
ELIF corpus. Then, for a reduced set of new variables (principal components) 
we would have to estimate the associated covariance matrix of the variables 
REi, . . . , REp. So, let the sample covariance matrix RE be the estimator of S. 



RE = 



REi I REi 2 . . . REi p 

REi 2 RE2^2 ■ ■ ■ RE2,p 



REi p RE2 ^p . . . REp^p 



(3) 



where REi^k estimates the covariance Cov{REi, REk). RE can be seen as a 
similarity matrix between REs. Unfortunately, due to this matrix huge size 
(25,838 X 25,838), we cannot obtain principal components using available soft- 
ware. Moreover it is unlikely that PCA could achieve the reduction we need: 
from 25, 838 original features to fc < 324 (the number of documents) new fea- 
tures (principal components). 

This is part of the European Legislation in Force (sub-ELIF) corpus: 
htp://europa.eu.int/eur-lex. 
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2.3 Geometrical Representations of Document Similarity 

We can associate to the jth document, the vector dj = [xij, . . . where 

Xi^j is the original number of occurrences of the ith RE in the jth document®. 
Now, we can have a smaller (324 x 324) covariance matrix 



S = 



■^ 1,1 


Si, 2 ■ 


. Si,n' 


Sl^2 


S2,2 ■ 


■ S2,n 


1 


S2,n ■ 


■ S-fi,n _ 



( 4 ) 



where 

^ i=l 

and X.J-, meaning the average number of occurrences per RE in the jth docu- 
ment, is given by^ 

^ i=p 

^■’3 = ■ ( 6 ) 

^ i=i 

Then S will be a similarity matrix between documents. 

Escoufier and L’Hermier [3] proposed an approach, based on PCA, to derive 
geometrical representations from similarity matrices. Since S is symmetric we 
have S = PAP^ , with P orthogonal (P = [ei, . . . , e„], the matrix of normal- 
ized eigenvectors of S) and A diagonal. The principal elements of A are the 
eigenvalues Ai, . . . , A„ of S' and Ai > A 2 • • • > A„ > 0. Thus S = QQ^ with 



S3, 1 = 



Q = PA^'^ . 



( 7 ) 



The elements of the ith line of Q will be the coordinates of the point associated 
with the ith document. We may consider only the coordinates corresponding to 
the leading eigenvalues. Then, to assess how much of the total information is 
carried out by the first k components, i.e. the first k columns of Q, we may use 

A 

PTV{kf = ^ . (8) 

A, 

So, by taking the first k columns of matrix Q such that PTV{k) equals, say .80 
or more, we can reduce the initial large number of features to k < n new features 
(components). However, considering the 324 documents of the corpus, if we use 
the original number of occurrences of the tth RE in the jth document {xij) 
to obtain “similarities” (see (5)), we need the first 123 components to provide 

® We will use p for the number of REs of the corpus and n for the number of documents. 
® These numbers of occurrences can be transformed, as we will see in Sect. 2.4. 

^ When we replace an index by a dot, a mean value has been obtained. 

® PTV are initials for cumulative Proportion of the Total Variance. 
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0.85 of the total information, i.e. PTV{\2‘i) = 0.85. Although it corresponds 
to 0.4% of the initial 25,838 features, large numbers of components must be 
avoided in order to minimize computational effort of the clustering process. So, to 
reduce this number, we need to “stimulate” similarities between documents, and 
therefore, the original occurrences of the REs in documents must be transformed. 

2.4 Transforming the Original Number of Occurrences 

As referred above, the geometrical representation may be obtained from trans- 
formed occurrences. The technique we used has four phases. In the first phase 
we standardize in order to correct document heterogeneity. This heterogeneity is 
measured by the variation between the occurrence numbers of the different REs 
inside each document. This variation may be assessed by 

^ i=p 

= j = l,...,n (9) 

where x.j is given by (6). The standardized values will be 

i = 1,. . . ,p; j = 1, . . . ,n . (10) 

In the second phase we evaluate the variation between documents for each RE. 
Although, each RE associated variation must reflect how much the RE occur- 
rences vary in different documents, due to document content, not due to docu- 
ment size. Therefore we use normalized values to calculate this variation: 

. j=n 

V{RE,) = i=l,...,p. (11) 

These values are important since we found that, generally, the higher the V (REi), 
the more information is carried out by the ith RE. On the other hand, it was 
observed that REs constituted by long words, usually are more informative from 
the IR / Text Mining point of view (e.g. agricultural products or communaute 
economique europeenne are more informative than same way, plus au moins or 
reach the level). Thus, in a third phase we define weighted occurrences as 

xP=x,^yV{RE,)-AL{RE,) ^=l,...,p■ j = l,...,n (12) 

where AL(REi) is the average number of characters per word in the ith RE. 
Lastly, in the fourth phase we carry out a second standardization considering 
the weighted occurrences. This is for correcting document size heterogeneity, since 
we do not want that the document size affects its relative importance. Thus 

zE = {xE-x*^^).[V{D*)\-^/^ 1 = 1,..., P-, J = l,...,n (13) 

where E (D*) = ~ 



j = l,...,n . 



(14) 



Multilingual Document Clustering 



79 



These standardizations are transformed occurrences and are used to obtain the 
similarity matrix between documents, whose generic element is given by 



2.5 Non-informative REs 

Some high-frequency REs appearing in most documents written in the same lan- 
guage are not informative from the Text Mining point of view, e.g., locutions such 
as Considerant que (having regard), and in particular, or other expressions which 
are incorrect REs, such as of the or dans les (in the). Although these expressions 
are useless to identify document topics, they are informative for distinguishing 
different languages. As a matter of fact they occur in most documents of the 
same language, and their associated variation (see (11)) is usually high or very 
high, i.e., they are relevant to “approximate” documents of the same language 
for calculating similarities between documents (see (12), (13) and (15)). 

So, it seems that either they should be removed to distinguish topics in docu- 
ments of the same language, or they should be kept for distinguishing documents 
of different languages. To solve this problem, we use the following criterion: the 
REs having at least one extremity (the leftmost or the rightmost word) that 
exists in at least 90 % of the documents we are working with, are removed from 
the initial set of REs. We follow that criterion since these expressions usually 
begin or end with words occurring in most documents of the same language, e.g., 
of, les, que, etc.. As we will see in Subsect. 3.3, the documents and REs with 
which the system is working, depend on the node of the clustering tree. 

To summarize, in this section we obtained a matrix where a small set of 
components characterizes a group of documents. This matrix will be used as 
input for clustering. For this purpose, the matrix of document similarity (S) 
(see (4)) was calculated. Its generic element is given by (15). Then, from S, 
Q was obtained by (7) and the first k columns of Q were taken, such that 
PTV(k) > 0.80. Finally, the latter matrix will be conveyed to clustering software. 

Considering the initial 25, 838 REs for the 324 documents of the sub-ELIF 
corpus, we obtained PTV(3) = .848; PTV(5) = .932 and PTV(8) = .955. 

3 Clustering Documents 

We need to split documents into clusters. However we do not know how many 
clusters should be obtained. Moreover, though we have obtained k features (com- 
ponents) to evaluate the documents, we do not know neither the composition of 
each cluster, nor its volume, shape and orientation in the fc-axes space. 

3.1 The Model-Based Cluster Analysis 

Considering the problem of determining the structure of clustered data, with- 
out prior knowledge of the number of clusters or any other information about 




l—p 
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their composition, Fraley and Raftery [4] developed the Model-Based Clustering 
Analysis (MBCA). By this approach, data are represented by a mixture model 
where each element corresponds to a different cluster. Models with varying geo- 
metric properties are obtained through different Gaussian parameterizations and 
cross-cluster constraints. This clustering methodology is based on multivariate 
normal (Gaussian) mixtures. So the density function associated to cluster c has 
the form 



.. I vn ^ exp{-^{xi- fi^)} 

fc{x^\fic, Sc) = ^ ; i 

( 27 t )-2 \Sc\-^ 



(16) 



Clusters are ellipsoidal, centered at the means fic', element Xi belongs to cluster 
c. The covariance matrix Sc determines other geometric characteristics. This 
methodology is based on the parameterization of the covariance matrix in terms 
of eigenvalue decomposition in the form Sc = XcDcAcD '^ , where Dc is the 
matrix of eigenvectors, determining the orientation of the principal components 
of Sc- Ac is the diagonal matrix whose elements are proportional to the eigen- 
values of Sc, determining the shape of the ellipsoid. The volume of the ellipsoid 
is specified by the scalar Ac- Characteristics (orientation, shape and volume) of 
distributions are estimated from the input data, and can be allowed to vary be- 
tween clusters, or constrained to be the same for all clusters. Considering our 
application, input data is given by the k columns of matrix Q (see (7) and (8)). 

MBCA subsumes the approach with Sc = XI, long known as k-means, where 
sum of squares criterion is used, based on the assumption that all clusters are 
spherical and have the same volume (see Table 1). However, in the case of k- 
means, the number of clusters has to be specified in advance, and considering 
many applications, real clusters are far from spherical in shape. Therefore we 
have chosen MBCA for clustering documents. Then, function emclust has been 
used with S-PLUS package, which is available for Windows and Linux. 

During the cluster analysis, emclust shows the Bayesian Information Crite- 
rion (BIC), a measure of evidence of clustering, for each “pair” model-number of 
clusters. These “pairs” are compared using BIC: the larger the BIC, the stronger 
the evidence of clustering (see [4]). The problem of determining the number of 
clusters is solved by choosing the best model. Table 1 shows the different models 



Table 1. Parameterizations of the covariance matrix Sc in the Gaussian model and 
their geometric interpretation 



Sc 


Distribution Volume 


Shape 


Orientation 


XI 


Spherical 


Equal 


Equal 




Xcl 


Spherical 


Variable Equal 




XDAD^ 


Ellipsoidal 


Equal 


Equal 


Equal 


XcDcAqDc 


Ellipsoidal 


Variable Variable Variable 


XDcADf 


Ellipsoidal 


Equal 


Equal 


Variable 


XDcADf 


Ellipsoidal 


Variable Equal 


Variable 
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used during the calculation of the best model. Models must be specified as a pa- 
rameter of the function emclust. However, usually there is no prior knowledge 
about the model to choose. Then, by specifying all models, emclust gives us 
BIC values for each pair model-number of elusters and proposes the best model 
which indicates which cluster must be assigned to each object (document). 

3.2 Assessing Normality of Data. Data Transformations 

MBCA works based on the assumption of normality of data. Then, Gaussian 
distribution must be checked for the univariate marginal distributions of the 
documents on each component. Thus, a QQ-plot is made for each component. 
For this purpose, each of the first k columns of matrix Q (see (7) and (8)) is 
standardized, ordered and put on y axis of the QQ-plot. Then, standardized 
normal quantiles are generated and put on x axis of the QQ-plot (see [5, pages 
146-162] for details). Fig. 1(a) represents the QQ-plot for the 2th component, 
assessing the normality of data of cluster 1.2®. This QQ-plot is representative, 
since most of the components for other clusters produced similar QQ-plots. Most 





(a) 



(b) 



Fig. 1. QQ-plot of data for cluster 1.2 on 2nd component (a); Chi-square plot of the 
ordered distances for data in cluster 1.2 (b) 



of the times, if QQ-plots are straight (univariate distributions are normal), the 
joint distribution of the k dimensions (components) are multivariate normal. 
However, multivariate normality must be tested. Then, a Chi-square plot is con- 
structed for each cluster. Thus, the square distances are ordered from smallest 
to largest as < d‘^ 2 ) ■ ■ ■ < where = {xj — Xc)^ — Xc). V 

ector Xj is the jth element (document) of cluster c; Xc is the means vector for 
the k dimensions of that cluster, and is the inverse of the estimator of the 
cluster covariance matrix. Then the pairs {d'^jy XkiU ~ 1/2)/’^)) ^'I'e graphed, 

® In Sect. 5 we will deal with specific clusters. 
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where m is the number of elements of the cluster and xldj ~ 1/2)/™) is the 
100(j — ll2)jm percentile of the Chi-square distribution with k (the number of 
components, see (8)) deg rees of freedom. The plot should resemble a straight 
line. A systematic curved pattern suggests lack of normality. Since the straight- 
ness of the univariate and multivariate plots are not perfect (see Fig. 1(a) and 
Fig. 1(b)), some transformations of the data were tried to approximate to nor- 
mality. Transformations were applied to some of the k columns of matrix Q (see 
(7) and (8)): those whose QQ-plot suggested lack of normality. So, let y be an 
arbitrary element of a given column we want to transform. We considered the 
following slightly modified family of power transformations from y to y^^'^ [6]: 



yW = 




X^O 
A = 0 



(17) 



Power transformations are defined only for positive variables. However this is not 
restrictive, because a single constant can be added to each element in the column 
if some of the elements are negative. Thus, given the elements yi, y 2 > ■ ■ ■ j 2/m in 
a given column, the Box-Cox [6] solution for the choice of an appropriate power 
A for that column is the one which maximizes the expression 



l{\) 




1 

m 






i=i 



ywy 



(A-l)^ln 



Vj 






(18) 



where y^^^ = — (19) 

m 

and TO is the number of elements of the cluster; y^^^ is defined in (17). So, every 
element y of the ith column of matrix Q will be transformed from y to yi^i 
according to (17) where A = A^, i.e., the value of A maximizing ^A). This new 
matrix was conveyed to clustering software. However, the results obtained were 
not better than using the non-transformed matrix, as we will see in Sect. 5. 
Therefor e, although the straightness of the plots in Fig. 1(a) and Fig. 1(b) are 
not perfect, as it should be to ensure normality, it is close enough to encourage 
us to use MBCA as an adequate approach for clustering this kind of data. 



3.3 Sub-clusters 

As we will see in Sect. 5, our approach organizes sub-ELIF corpus in 3 main clus- 
ters: English, Portuguese and French documents. However, we can distinguish 
different subjects in different documents of the same cluster. So, a hierarchical 
tree of clusters is built as follows: let us consider that every cluster in the tree 
is a node. For every node, non-informative REs are removed (see Subsect. 2.5) 
from the set of REs contained in the documents of that node (a subset of the 
original REs), in order to obtain a new similarity matrix between documents 
(see (15)). Then, the first k columns of the new matrix Q are taken (see (7) and 
(8)), and new clusters are proposed by MBCA. 
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3.4 Choosing the Best Number of Clusters 

As has been said, MBCA calculates the best model based on the k first columns 
of matrix Q (k components). A large number of components means no significant 
information loss, which is important for a correct clustering (best model) to be 
proposed by MBCA. On the other hand, a large number of components must 
be avoided, since it takes MBCA to estimate large covariance matrices - during 
the internal computation for different models - which can be judged to be close 
to singularity (see [4]). Therefore, we use the following criterion: the first k 
components are chosen in such a way that PTV{k) > 0.80 (see (8)). 

Then, based on those k components, MBCA produces a list of BIC values 
for each model: VVV (Variable volume, Variable shape, Variable Orientation), 
EEV etc. (see Table 1). Each list may have several local maxima. The largest 
local maximum over all models is usually proposed as the best model. However, 
a heuristic that works well in practice (see [4]) chooses the number of clusters 
corresponding to the first decisive local maximum over all the models considered. 



4 Summarization 



Summarizing a document and summarizing a cluster of documents are different 
tasks. As a matter of fact, documents of the same cluster have common REs 
such as rational use of energy or energy consumption, rather than long sentences 
which are likely to occur in just one or two documents. Then, summarizing topics 
seems adequate to disclose the core content of each cluster. 

Cluster topics correspond to the most important REs in the cluster. Let the 
cluster from where we want to extract its topics be the “target cluster” . Then, 
in order to extract it, first the REs of the parent node of the target cluster are 
ordered according to the value given by Score(REi) assigned to the fth RE. 

Score(REi) = Thr{RE,) ■ V{RE,) ■ AL{RE,) where (20) 



Thr{REi) 



1 SCP-f{REi) > threshold 

0 else 



( 21 ) 



V{REi) and AL{REi) have the same meaning as in (12). Thus, Thr{-) corre- 
sponds to a filter that “eliminates” REs whose SCP-f{-) cohesion value (see Sect. 
2) is lower than threshold - a value empirically set to 0.015. These REs, e.g., 
Considerant que, and in particular, etc., are not informative for IR / Text Mining. 
Most of the times, these REs are previously eliminated when selecting the infor- 
mative REs for calculating the covariance matrix; however it may not happen 
in case of a multilingual set of documents, (see Siibsect. 2.5). 

So, the largest Score(REi) corresponds to the most important RE. For ex- 
ample, according with this criterion, the 15 most important REs of the ini- 
tial cluster (the one containing all documents) are the following: Nomenclatura 
Combinada (Combined Nomenclature), nomenclature combinee. Member States, 
combined nomenclature, Etats membres (Member states). Council Regulation, 
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produtos agricolas (agricultural products), produits agricoles, utilizagdo racional 
(rational use), nomenclature tarifaire (tariff nomenclature), autoridades aduan 
eiras (customs authorities), estdncia aduaneira (customs office). Common Cus- 
toms, indicados na coluna (indicated in column) and agricultural products. 

Now, taking for instance the “French docnments” as the target cluster, we 
cannot “guarantee” that produits agricoles will be a topic, since not every French 
document content is about produits agricoles. On the other hand, the same topic 
often appears in different documents, written in different forms, (e.g. produits 
agricoles and Produits Agricoles). Hence, according to Score{-), the 15 most 
important REs of the target cluster occurring in at least 50 % of its documents 
are put in a list. From this list, the REs with Score{-) value not lower than 1/50 
of the maximum Score{-) value obtained from that list, are considered topics. 

5 Results 

Sub-ELIF is a multilingual corpus with 108 documents per language. For each 
document there are two other documents which are translations to the other 
languages. From Table 2 we can see the hierarchical tree of clusters obtained by 
this approach. We also present the topics extracted from each cluster. 



Table 2. Evaluation of the clusters 



Cluster 


Main topic 


Correct^ Totals Act. 


corr. # Free. (%) Rec. (%) 


1 


european communities 


108 


108 


108 


100 


100 


2 


Comunidade Europeia 


108 


107 


107 


100 


99.1 


3 


Communaute europeenne 


108 


109 


108 


99.1 


100 


1.1 


rational use of energy 


23 


23 


20 


86.9 


86.9 


1.2 


agricultural products 


27 


27 


21 


77.8 


77.8 


1.3 


combined nomenclature 


58 


58 


51 


87.9 


87.9 


2.1 


economia de energia 


23 


26 


21 


80.8 


91.3 


2.2 


produtos agricolas 


27 


25 


21 


84 


77.8 


2.3 


Nomenclatura Combinada 


58 


56 


52 


92.9 


89.7 


3.1 


politique energetique 


23 


26 


22 


84.6 


95.7 


3.2 


produits agricoles 


27 


27 


21 


77.8 


77.8 


3.3 


nomenclature combinee 


58 


56 


53 


94.6 


91.4 



Cluster 1: European Communities, Member States, EUROPEAN COMMU- 
NITIES, Council Regulation, Having regard to Council Regulation, Having regard 
to Council and Official Journal. 

Cluster 2: Comunidade Europeia, Nomenclatura Combinada, COMUNIDA- 
DES EUROPEIAS and directamente aplicdvel (directly applicable). 

Cluster 3: Communaute europeenne, nomenclature combinee, Etats mem- 
bres, COMMUNAUTES EUROPEENNES and directement applicable. 
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Cluster 1.1: rational use of energy, energy consumption and rational use. 

Cluster 1.2: agricultural products, Official Journal, detailed rules. Official 
Journal of the European Communities, proposal from the Commission, publica- 
tion in the Official Journal and entirely and directly. 

Cluster 1.3: combined nomenclature. Common Customs, customs authori- 
ties, No 2658/87, goods described, general rules, appropriate CN, Having regard 
to Council Regulation, tariff and statistical and Customs Code. 

Cluster 2.1: economia de energia (energy saving), utilizagao racional, racio- 
nal da energia (rational of energy) and consume de energia (energy consuming). 

Cluster 2.2: produtos agricolas, Comunidades Europeias, Jornal Oficial, di- 
rectamente aplicdvel, COMUNIDADES EUROPEIAS, Jornal Oficial das Comu- 
nidades, directamente aplicdvel em todos os Estados-membros (directly applicable 
to all Member States), publicagdo no Jornal Oficial, publicagdo no Jornal Oficial 
das Comunidades and Parlamento Europeu (European Parliament). 

Cluster 2.3: Nomenclatura Combinada, autoridades aduaneiras (customs 
authorities), indicados na coluna, mercadorias descritas, informagdes pautais 
vinculativas (binding tariff informations), Aduaneira Comum (Common Cus- 
toms), regras gerais, cddigos NC (NC codes) and COMUNIDADES EUROPEIAS. 

Cluster 3.1: politique energetique (energy policy), rationnelle de energie and 
I’utilization rationnelle. 

Cluster 3.2: produits agricoles, organisation commune, organisation com- 
mune des marches, directment applicable. Journal offciel. Journal officiel des 
Communautes and COMMUNAUTES EUROPEENNES. 

Cluster 3.3: nomenclature combinee, autorites douanies (customs author- 
ities), nomenclature tarifaire. No 2658/87, marchandises decrites, tarifaire et 
statistique (tariff and statistical) and COMMUNAUTES EUROPEENNES. 

5.1 Discussion 

MBCA (function emclust) proposed clusters 1, 2 and 3 considering EEV as the 
best model, (see Table 1). EEV model was also considered best model when the 
sub-clusters of clusters 1, 2 and 3 were calculated. Clusters were obtained with 
non-transformed data to approximate to normality (see Subsect. 3.2). 

In Table 2, column Main topic means the most relevant topic according 
to (20) obtained for the cluster indicated by column Cluster; by Correct // we 
mean the correct number of documents in the corpus for the topic in Main 
topic; Total // is the number of documents considered to belong to the cluster 
by our approach; Act. corr. // is the number of documents correctly assigned to 
the cluster; Prec. (%) and Rec. (%) are Precision and Recall. 

Although the original texts of the sub-ELIF corpus are classified by main 
topic areas, e.g., Agriculture, Energy - Rational use, etc., we have removed that 
information from the documents before extracting the REs using LocalMaxs, as 
we wanted to test our approach for clustering usual documents. Although, the 
topics shown in the previous lists capture the core content of each cluster. How- 
ever, each cluster is not a perfect “translation” of the “corresponding” clusters 
in the other languages. Some reasons may help to explain why. Thus, because in 
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Portuguese we write Estado-Membro (just one word) for Member State, it will 
not be in cluster 2, since LocalMaxs does not extract unigrams. Moreover, not 
every RE has the same number of occurrences of the corresponding RE in the 
other languages. For example, there are 55 occurrences of Nomenclatura Com- 
binada, 54 of nomenclature combinee, 45 of combined nomenclature and 10 of 
Combined Nomenclature. Then, under the criterion used for extracting topics 
(see Sect. 4), 45 occurrences is less than 50% of the 108 documents of the cluster 
1. Therefore combined nomenclature is not considered a topic in cluster 1. 

As we keep the original words in text, the same topic may be shown in dif- 
ferent forms, e.g., EUROPEAN COMMUNITIES and European Communities. 
Though some REs are not topics or just weakly informative expressions, such as 
Journal ojjiciel or general rules, we think that about 80% of these REs can be 
considered correct topics / subtopics, exposing the core content of the clusters. 

Since our approach is not oriented for any specific language, we believe that 
different occurrences for the same concept in the three different languages, are 
the main reason for different Precision and Recall scores comparing “correspond- 
ing” clusters. Furthermore, some documents have REs written in more than one 
language, for example there is a “Portuguese document” containing some im- 
portant REs written in French. This reduced Precision of cluster 3 and Recall 
of cluster 2, since this document was oddly put in cluster 3. 

We transformed data to approximate to normality (see Subsect. 3.2) and 
use it as input to clustering software. Unfortunately, this modified data caused 
a decrease of Precision (about 20% lower) and the number of subclusters for 
clusters 1, 2 and 3 were different. However, we will work again on other data 
transformations with larger corpora in order to obtain better results. 

As has been said in Sect. 3.1, the larger the BIG value, the stronger the 
evidence of clustering (see [4]). Then, BIG can be used to decide whether to 
“explore” sub-clusters or not. However, we did not work on this problem yet. 



6 Related Work 

Some known approaches for extracting topics and relevant information use mor- 
phosyntactic information, e.g., TIPSTER [7]. So, these approaches would need 
specific morphosyntactic information in order to extract topics from documents 
in other languages, and that information might not be available. 

In [8], a multidocument summarizer, called MEAD is presented. It generates 
summaries using cluster topic detection and tracking system. However, in this 
approach, topics are unigrams. Though many uni words have precise meanings, 
multiword topics are usually more informative and specific than unigrams. 

In [9], an approach for clustering documents is presented. It is assumed that 
there exists a set of topics underlying the document collection to cluster, since 
topics are not extracted from the documents. When the number of clusters is 
not given to this system, it is calculated based on the number of topics. 
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7 Conclusions 

We presented an unsupervised statistics-based and language independent ap- 
proach for document clustering and topic extraction. It was applied to a multi- 
lingual corpus, using just information extracted from it. Thousands of REs were 
extracted by LocalMaxs algorithm and then, transformed into a small set of new 
features, which - according to the results obtained - showed good document clas- 
sification power. The best number of clusters was automatically calculated by 
MBCA and the results obtained led to a rather precise document clustering. 
Thus, the number of clusters was not left to the user choice, as it might corre- 
spond to an unnatural clustering. Although we tested this approach on a small 
corpus (872,795 words), the results are encouraging, since about 80% of the clus- 
ters REs are sufficiently informative for being taken as topics of documents. This 
lead us to believe that topics, rather than long sentences belonging to just one 
or two documents, are adequate to define clusters core content. 

We tried some data transformations when some distributions suggested lack 
of normality. However, considering the clustering software we used, the prelimi- 
nary results showed that the negative effect of these data modifications seems to 
be more important than the positive effect they produce to make distributions 
more “normal (Gaussian) looking” . Although, further research is required on 
larger corpora to solve this and other problems. 
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Abstract. When facing the need to select the most appropriate algo- 
rithm to apply on a new data set, data analysts often follow an approach 
which can be related to test-driving cars to decide which one to buy: 
apply the algorithms on a sample of the data to quickly obtain rough 
estimates of their performance. These estimates are used to select one 
or a few of those algorithms to be tried out on the full data set. We 
describe sampling-based landmarks (SL), a systematization of this ap- 
proach, building on earlier work on landmarking and sampling. SL are 
estimates of the performance of algorithms on a small sample of the data 
that are used as predictors of the performance of those algorithms on the 
full set. We also describe relative landmarks (RL), that address the in- 
ability of earlier landmarks to assess relative performance of algorithms. 
RL aggregate landmarks to obtain predictors of relative performance. 
Our experiments indicate that the combination of these two improve- 
ments, which we call Sampling-based Relative Landmarks, are better for 
ranking than traditional data characterization measures. 



1 Introduction 

With the growing number of algorithms available for data mining, the selection 
of the one that will obtain the most useful model from a given set of data is 
an increasingly important task. Cross-validation with adequate statistical com- 
parisons is currently the most reliable method for algorithm selection. However, 
with the increasing size of the data to be analyzed, it is impractical to try all 
available alternatives. 

The most common approach to this problem is probably the choice of the 
algorithm based on the experience of the user on previous problems with similar 
characteristics. One drawback with this approach is that it is difficult for the user 
to be knowledgeable about all algorithms. Furthermore, two sets of data may 
look similar to the user but they can be sufficiently different to cause significant 
changes in algorithm performance. 

In meta-learning, the idea of using knowledge about the performance of al- 
gorithms on previously processed data sets is systematized [1,5]. A database of 
meta-data, i.e., information about the performance of algorithms on data sets 
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and characteristics of those data sets, is used to provide advice for algorithm 
selection, given a new problem. Most approaches to meta-learning have concen- 
trated on selecting one or a set of algorithms whose performance is expected 
to be the best [1,12]. However, taking into account that no prediction model 
is perfect and that in most meta-learning settings the amount of meta-data is 
not very large, there is a growing interest in methods that provide a ranking 
of the algorithms according to their expected performance [3,9,14]. The ranking 
approach is more flexible, enabling the user to try out more than one algorithm 
depending on the available resources as well as to accommodate his/her personal 
preferences. Probably the most important issue in meta-learning is how to obtain 
good data characteristics, i.e. measures that provide information about the per- 
formance of algorithms and that can be computed fast. The earlier approaches 
used three types of measures, general (e.g. number of examples), statistical (e.g. 
number of outliers) and information-theoretic (e.g. class entropy) [10, sec. 7.3]. 
Landmarks were recently suggested as data characterization measures [2,12]. A 
landmark is a simple and efficient algorithm that provides information about 
the performance of more complex algorithms. For example, the top node in a 
decision tree can be an indicator of the performance of the full tree. 

Another common approach to the problem of algorithm selection is sampling, 
which consists of drawing a (random) sample from the data and trying at least 
a few of the available algorithms on it. The algorithm performing best on this 
sample is then applied to the full data set. Petrak [11] performed a systematic 
study of this approach in the supervised classification setting. Even with simple 
sampling strategies, positive results were obtained on the task of single algorithm 
selection. However, the learning curves presented in that work indicate that large 
samples are required to obtain a clear picture of the relative performance between 
the algorithms. In this paper we combine both ideas, landmarks and sampling, in 
a framework for data characterization, called sampling-based landmarks (SL). In 
SL the results of algorithms on small samples are not used directly as estimators 
of algorithm performance, as it is done in sampling. They serve as predictors 
in the same spirit of landmarking, i.e. as data characteristics used in the meta- 
learning process. 

Although landmarking has also shown positive results for single algorithm 
selection [2,12], it does not provide much information about the relative per- 
formance of algorithms, as recent results show [3]. To address this problem, we 
use relative landmarks (RL). RL aggregate landmarks to obtain measures that 
are predictors of relative, rather than individual, performance. We believe they 
will be more appropriate for ranking and algorithm selection in general than 
traditional (non-relative) landmarks. The experiments performed in this paper 
combine these two improvements to obtain Sampling-based Relative Landmarks. 

When providing support to data mining users, one must take into account 
that the evaluation of the model is not only based on its accuracy. Here we will 
work in the setting defined in [14], where the accuracy and total execution time 
of supervised classification algorithms are combined. 
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In the next section we present sampling-based landmarks. In Section 3 we 
then describe relative landmarks and how to implement them using the Adjusted 
Ratio of Ratios, which is a multicriteria evaluation measure for classification 
algorithms. In Section 4 we combine both concepts into sampling-based relative 
landmarks and compare them empirically with traditional data characterization 
measures for ranking. Related work is discussed in Section 5 and in the final 
section we present some conclusions and describe future work. 

2 Sampling-Based Landmarks 

Landmarks were introduced in [2,12] as fast and simple learning algorithms 
whose performance characteristics can be used to predict the performance of 
complex algorithms. The simplicity of the chosen algorithms was necessary in 
order to achieve acceptable speed: meta-learning will only be an interesting al- 
ternative to cross-validation if data set characterization is significantly faster 
than running the algorithms. As an alternative to simplifying algorithms, we 
can achieve speed by reducing the size of the data. A Sampling-based landmark 
is an estimate of the performance of an algorithm obtained by testing a model 
that was obtained with that algorithm on a small sample of a data set on another 
small sample of the same data. 

Different sampling strategies have been described in the literature to get good 
performance estimates by using only a small amount of data. Simple strategies 
include sampling a fixed number or a fixed fraction of cases. Other strategies try 
to estimate the number of cases needed by analyzing certain statistical properties 
of the sample [7,8]. Still more elaborate strategies try to find an optimum sample 
size by repeated evaluation of increasing samples [13]. 

Here, we use the simple approach of randomly selecting a constant size sample 
of 100 cases, and a fixed fraction of 10% for testing. Thus, the (usually critical) 
training effort is reduced to constant time complexity, while testing time is re- 
duced by a constant factor (sampling itself is comparably fast with linear time 
complexity for sequential access files). Furthermore, the results presented in [11] 
indicate that the quality of the estimates depend more on the size of the test 
sample than on the size of the training sample. 

The use of Sampling-based Landmarks is not restricted to predicting accu- 
racy. For instance, the time to obtain the landmark can also be used to predict 
the time to run the algorithm on the full set of data and the same applies to the 
complexity of the model. As explained below, here we concentrate on accuracy 
and time. 



3 Relative Landmarks 

Landmarks can be generally useful as indicators of algorithm performance. How- 
ever, they do not provide information about the relative performance of algo- 
rithms, as is required for algorithm selection and, in particular, for ranking [3]. 
To provide relative performance information, landmarks should be aggregated 
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in some way, leading to the notion of relative landmarks (RL). Here RL are 
based on the Adjusted Ratio of Ratios (ARR), a multicriteria evaluation mea- 
sure combining accuracy and time to assess relative performance [14]. In ARR 
the compromise between the two criteria involved is given by the user in the 
form “the amount of accuracy I’m willing to trade for a 10 times speed-up is 
A%” . The ARR of algorithm i when compared to algorithm j on data set d is 
defined as follows: 



ARRfj = 




*X 



where Af and Tf are the accuracy and time of i on d, respectively. Assuming, 
without loss of generality, that algorithm i is more accurate but slower than 
algorithm j on data set d, the value of ARRf j will be 1 if t is X% more accurate 
than j for each order of magnitude that it is slower than j. ARRfj will be larger 
than one if the advantage in accuracy of i is X% for each additional order of 
magnitude in execution time and vice-versa. 

When more than two algorithms are involved, we can generate a relative 
landmark for each of the n landmarks, /, based on ARR in the following way: 



rlf = 









n — 1 



( 1 ) 



Each relative landmark, rlf, represents the relative performance of landmark i 
when compared to all other landmarks on data set d^. Therefore, a set of relative 
landmarks, rlf, can be used directly as meta-features, i.e. characteristics of the 
data set, for ranking. Given that the main purpose of relative landmarks is 
characterizing the relative performance of the algorithms, we can alternatively 
order the values of the relative landmarks, and use the corresponding ranks, 
r,,;d(i), as meta-features. 



4 Experiments with Sampling-Based Relative Landmarks 

The aim of our experiments was to compare data set characterization by tradi- 
tional measures (general, statistical and information-theoretic) and by sampling- 
based relative landmarks (SRL), i.e. a combination of both improvements intro- 
duced above. We will refer to the SRL as SRL-scores, their ranks as SRL-ranks 
and the traditional data characterization as DC-GSI. 

Next we will describe the experimental setting used to obtain the meta-data. 
Then, we describe the meta-learning methods used and, finally, the results. 

Meta-Data: SRL are obtained very simply by creating relative landmarks 
(Section 3) using the performance of the candidate algorithms on a sample of 

^ Since the relative landmarks, rlf, depend on the X parameter and on the set of 
algorithms used, we store raw performance information, Af and Tf and calculate 
the relative landmarks on-the-fiy. 
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the data (Section 2). Assuming that s{d) is the sampling method used, n is the 
number of algorithms, we calculate one SRL, for each algorithm i using 

Eq. 1, thus obtaining a set of n SRL measures. Here we have used a simple 
sampling strategy: train on a stratified random sample with 100 cases and test 
on 10% of the rest of the data set. 

We have used 45 data sets with more than 1000 cases, mostly from the UCI 
repository [4] but also a few others that are being used on the METAL project^ 
(SwissLife’s Sisyphus data and a few applications provided by DaimlerChrysler). 
Ten algorithms were executed on these data sets^: two decision tree classifiers, 
C5.0 and Ltree, which is a decision tree that can introduce oblique decision sur- 
faces; the IBl instance-based and the naive Bayes classifiers from the MLC-I--I- 
library; a local implementation of the multivariate linear discriminant; two neu- 
ral networks from the SPSS Clementine package (Multilayer Perceptron and 
Radial Basis Function Network); two rule-based systems, C5.0 rules and RIP- 
PER; and an ensemble method, boosted C5.0. The algorithms were all executed 
with default parameters. Since we have ten algorithms, we generated a set of ten 
SRL measures and another with the corresponding ranks for each data set. In 
the DC-GSI setting we have used the 25 meta-features that have performed well 
before [14]. 

Meta-learning Methods: Two ranking methods were used. The first is the 
Nearest-Neighbor ranking using ARR as multicriteria performance measure [14]. 
Three scenarios for the compromise between accuracy and time were considered, 
X G {0.1%, 1%, 10%}, for increasing importance of time. To obtain an overall 
picture of the performance of the ranking method with the three different sets of 
data characterization measures, we varied the number of neighbors (fc) from 1 to 
25. The second meta- learning method consists of building the ranking directly 
from the sampling-based relative landmarks. In other words, the recommended 
ranking for data set d is defined by the corresponding SL-ranks, This 

method can be regarded as a baseline that measures the improvement obtained 
by using sampling-based relative landmarks as predictors, i.e. as meta-features 
used by the NN ranking method, rather than directly as estimators for ranking. 
This method is referred to as SL-ranking. 

The evaluation methodology consists of comparing the ranking predicted by 
each of the methods with the estimated true ranking, i.e. the ranking based 
on the estimates of the performance of the algorithms on the full set of data 
[14]. The distance measure used is average Spearman’s rank correlation that 
yields results between 1 (for perfect match) and -1 (inverted rankings). The 
experimental methodology is leave-one-out. 

Results: The results are presented in Figure 1. The first observation is that in 
all scenarios of the accuracy/time compromise, SL-ranking always achieves worse 
results than the alternatives. This is consistent with an observation that can be 
made from previous work on sampling [11], namely that relatively large sample 

^ Esprit Long-Term Research Project (#26357) A Meta-Learning Assistant for Pro- 
viding User Support in Data Mining and Machine Learning (www.metal-kdd.org). 

® References for these algorithms can be found in [11]. 
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sizes are required to correctly rank the algorithms directly from the performance 
on the sample. 



X = 0.1 







DC-GSI SRL • scores -^SRL- ranks m SL- tanking | 



Fig. 1. Average Spearman rank correlation coefficient (rg) obtained by NN/ARR rank- 
ing using two types of sampling-based relative landmarks (SRL-scores and SRL-ranks) 
and traditional data characterization (DC-GSI) for varying number of neighbors (fc) 
in three different scenarios of the compromise accuracy/time. Also included is the per- 
formance of the ranking obtained directly from the sampling-based relative landmarks 
(SL-ranking). 



These results also indicate that SRL-scores represent the best overall method 
especially for reasonable number of neighbors (fc < 10, which represents roughly 
20% or less of the number of data sets), except when time is the dominant 
criterion. In the latter case and for k < 4, DC-GSI performs better. This is 
expected because for such a small sample size, the precision (number of decimal 
places) used to measure time is not sufficient to be discriminative. However, it 
is interesting to note that in the same scenario, SRL-scores is again comparable 
or better than DC-GSI for 4 < /c < 10. 

Focussing the comparison on both types of SRL, SRL-scores and SRL-ranks, 
we observe that the former seem to be much more stable with smaller number of 
neighbors. We believe that this makes this approach more appealing, although in 
the scenario where accuracy is the dominant criterion, the performance obtained 
with SRL-ranks is better for K G {4,5,6}. There is also a tendency for SRL- 
ranks to be better than SRL-scores for larger, less common, number of neighbors 
(10 < k < 25). These results are somewhat surprising, given that SRL-scores 
are expected to be more sensitive to outliers. We would expect to mitigate this 
problem by using the information about the performance of algorithms at a more 
coarse level, i.e. SL-ranks. However, we seem to lose too much information with 
the latter. 
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5 Related Work 

The concepts of relative landmarks and subsampling landmarks, which are essen- 
tially the same as the ones described here, have been presented by Fiirnkranz et 
al [6]. The authors also present an interesting taxonomy of relative landmarks, 
that includes both types used here, i.e. using scores or their ranks. However, 
there are two main differences between the two approaches, the most important 
being that we combine both concepts, obtaining sampling-based relative land- 
marks. Also, in [6] the aim is to select a single algorithm while we are concerned 
with ranking the candidate algorithms. 

6 Conclusions 

Landmarks have been introduced in [2,12] as simple and efficient algorithms 
that are useful in predicting the performance of more complex algorithms. We 
observe that efficiency can be obtained not only by simplifying algorithms but 
also by sampling the data. Sampling-based landmarks are obtained by running 
the candidate algorithms on small samples of the data and are used as data set 
characterization measures rather than as estimators of algorithm performance 
[11]. As landmarks have little ability to predict relative performance, this makes 
them less suitable for ranking, as it was shown empirically [3]. Relative land- 
marks, obtained by adequately aggregating landmarks, can be used to address 
this issue. In this paper we have used the Adjusted Ratio of Ratios [14], a multi- 
criteria relative performance measure that takes accuracy and time into account, 
for that purpose. 

We have empirically evaluated the combination of these two improvements 
to landmarking, referred to as sampling-based relative landmarks. Here, they 
were based on very small samples. We compared them to a set of traditional 
data characterization measures and also to rankings generated directly from the 
sampling-based landmarks. In our experiments with a Nearest-Neighbor ranking 
system that also used the ARR multicriteria measure, the best results in gen- 
eral were obtained with sampling-based relative landmarks when the number of 
neighbors ranged from 10% to 20% of the total number of data sets. 

Although the results are good, there is still plenty of work to do. Concerning 
sampling-based landmarks, we intend to study the effect of increasing sample 
size and other sampling methods. We will also try to reduce the variance of the 
estimates and to generate simple estimators of the learning curves of each algo- 
rithms. We plan to combine sampling-based landmarks with a selection of tradi- 
tional characterization measures. We also intend to improve relative landmarks, 
namely by analyzing other ways of aggregating performance and to apply rela- 
tive landmarks to traditional landmarks, i.e. obtained by simplifying algorithms. 
Finally, we note that both frameworks, sampling-based and relative landmarks, 
can be applied in settings where multicriteria performance measures other than 
ARR are used [9] as well as settings where accuracy is the only important per- 
formance criterion. 
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Abstract. A general framework for the design of error adaptive learning 
algorithms in multiple output domains based on Dietterich's ECOC approach, 
recursive error output correcting codes and iterative APP decoding methods is 
proposed. A particular class of these Recursive ECOC (RECOC) learning 
algorithms based on Low Density Parity Check is presented. 



1 Introduction 

As the ECOC [1] decoding strategy is roughly of a hard decision type, the final 
hypothesis cannot exploit the knowledge arising from the observed binary learners’ 
performances. In this way, the ECOC approach cannot lead to any error adaptive 
learning algorithm except when used in the core of AdaBoost variants [2]. This is a 
striking quest considering that original AdaBoost formulation is essentially binary 
oriented. Furthermore, both AdaBoost and ECOC seem to be different for Machine 
Learning literature [3], even when ECOC should be natural extension of AdaBoost. 
This work follows the approach introduced in fl][5]. In this paper, a particular 
instance of RECOC learning based Low-Density Parity Check (LDPC) codes [6] is 
presented. 

The remainder of this paper is organized as follows. In section 2, we review the 
RECOC approach. In section 3, we present the RECOC_LDPC learning algorithm. In 
section 4, we present experimental results. Finally, in Section 5 conclusions and 
further work are presented. 



2 Low Complexity Error Adaptive Learning Algorithms 

In previous work [4], we demonstrated that AdaBoost decision could be interpreted as 
an instance of threshold decoding [7] for a T-repetition* code under the assumption of 
a binary discrete memoryless channel constructed by the ensemble of binary weak 



* Each codeword of length T carries a unique information bit and T-1 replicas. The unique 
informative codeword bit can be recovered by simple majority or threshold decoding. 
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learners. Let us consider the application of this result to the design of a generalized 
class of error adaptive ECOC algorithms based on binary error correcting codes. As a 
first example, we can think about the class of ECOC algorithms induced by binary 
linear codes. Each M-valued output label can be broken down into k , k > flog 2 M~\ 

binary labels by means of a suitable quantization process. Afterwards, each 

k — bitstream can be channel encoded using codewords of length n> k . Following 
the ECOC approach, a noisy codeword h can be computed at the learner’s side so 
that the learning problem can be expressed as a general decoding instance. Decoding 
is in essence a decision making process. Given a received noisy codeword, a decoder 
tries to figure out a transmitted information bit or a transmitted codeword. Maximum 
Likelihood decoding while minimizing the probability of codeword error do not 
necessary minimizes the probability of symbol (bit) error. Symbol Maximum A 
Posteriori (MAP) decoding methods effectively minimizes bit error probability by 
computing the maximum on the set of fl posteriori probabilities (APP) [7]. Let €■ be 
a codeword bit in a codeword C of length n and k informative codeword bits at the 
first k positions. The APP probabilities are as follows 

p(c I \ context) l<i<k (1) 

The event context refers in general to all available information at symbol 
decoding time including channel statistics, the received noisy codeword h and the 
mathematical description of the coding scheme. For linear block codes such structure 
is defined by the parity check matrix H or the generator matrix . 

Definition (linear block code) A linear {n,k) block code is defined by a nxk binary 
generator matrix G so that a k -bit binary source message U is encoded as a « — bit vector or 
codeword C = U ■ G mod 2. In addition for a systematic code G = [l^. P] . Alternatively, a 
linear block code can be defined through parity check matrix for which the 

relation H C =0 mod 2 holds, being C the transpose codeword. 

In our learning-decoding framework, channel statistics are defined by statistical 
behavior of binary learners' errors. In this way, we can think about a discrete 
memoryless channel constructed in a training phase and explicitly used in future 
predictions by means of suitable APP decoding algorithms. The use of APP decoding 
techniques for ECOC expansions based on binary linear codes allows the design of 
error adaptive learning algorithms in arbitrary output domains. It should be noted, 
however, that error adaptation is only one of the learning constraints. It is well known 
that ECOC schemes should resemble random coding in order to achieve successful 
learning. ECOC codes found by exhaustive search cannot be decoded by APP 
methods as no underlying mathematical structure can be defined on them. It is 
worthwhile to note that the ECOC formulation though requiring almost 
pseudorandom codes does not take advantage of the random coding characteristic in 
the decoding step. To be fair, one should remember that practical coding theory did 
the same for almost 50 years until the development of iterative decoding techniques 
[8] for recursive error correcting codes [9] like Turbo Codes [10] [11] or LDPC codes. 




98 



Elizabeth Tapia et al. 



These binary linear codes are for construction inherently pseudorandom. Decoding is 
performed in a recursive way each using time APP decoding methods. Regarding 
these considerations, we will focus our attention on ECOC alike algorithms based 
binary linear recursive error correcting codes and iterative APP decoding techniques 
i.e. the so called on RECOC models [5]. 

By means of a simple example, let us explain how the above considerations fit in 
the design of low complexity error adaptive learning algorithms in non-binary output 
domains. Let us assume M = 16 so that k =A can be assumed. Each output label is 
first expressed in bitstreams u = (mq, Mj, Mj, Mj) and then channel encoded by 
means of a simple (8,4,4)^ extended 0 Hamming code. The code is systematic with 
sixteen different codewords. A transmitted C = (m„, Mj, m^, u^, z^, z,, z^, z,) 
0 -codeword has parity bits z, , 4 < ; < 7 defined as follows 

Z4 + «o + “i + “2 = 0 
z^~^Uq-\-u^-\-U2 — 0 
Z 5 TI/q II 2 -f-Ug = 0 
Z 5 -\- U 2 -f-Ug = 0 



Each equation in (3) can be considered a simple parity check code for the 
transmission of three information bits. In learning terms, our (g,4,4) code can be 

also defined in terms of four parity check subcodes, each of them involved in the 
learning of some target concept in an M’=8 output space. This recursive view can be 
modeled by means of the so-called Tanner graphs [9] [12]. The set of parity coding 
constraints 5, , 0 < f < 3 , and codeword bits define the nodes in the graph while 
edges are put connecting parity constraints to codeword bits (see Fig. 1). 

Redundant concepts are introduced in order to cope with the underlying learning 
noise. A received noisy codeword h = (/!„, h^, /^, h^, /!,, h) is the result 

of a random toggling process on codeword bits (binary concepts) in C=[cJ 0<i<7 . A 
simple model for this toggling process is a binary discrete additive one. 



/,{c,) = P(/t,lc,) = 




k ^ c, 

k = c, 



0 < ;■ < 7 



(3) 



.In this way, the above set of constraints can be directly introduced in a Tanner 
graph, h. variables are instantiated at codeword reception time as they are the binary 
learners’ predictions. In addition, information bits (binary target concepts defining the 
M-valued one) can be constrained by a prior distribution g.=p{u.), 0<i< 3. The 
resulting extended Tanner graph is known as the code Factor graph [8] ( Fig. 1). A 
RECOC rule requires the computation of probabilities p{c. 1 0, j , , 

so that we can give an estimate c. * of each binary informative target concept 



^ Using standard coding notation, it denotes a (^n , k , d ) linear block code, being n the 
codeword length, k the number of information bits and d the minimum code distance 
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c* = arg Max [p(c,. I 0, go,.. i = 0,....,k - 1 (4) 

Ui 

These probabilities can be computed by a message-passing algorithm namely a 
generalized version of Pearl's Belief propagation [13] algorithm in graph with cycles. 
The observed results in the coding field are almost optimal, despite of the presence of 

cycles. The only requirement would be knowledge of probabilities , 0 < i < n — l , 

governing the learning noise. However, these probabilities can be estimated from the 
training error responses achieved by the set of binary learners under the training sets 
TS ■ induced in encoding ECOC training phase from the original training sample TS . 

Pi = PrSi [hi (x) y] 0 < / < « - 1 (5) 




Fig. 1. Factor Graph for the (8,4,4) extended Hamming code 



3 RECOC_LDPC 

In order to obtain good results with iterative decoding techniques applied to recursive 
codes based on simple parity check codes, some additional constraints on the Tanner 
graph structure are required. These constraints are met by the so-called LDPC codes. 

Definition LDPC codes [6]. A Low-Density Parity Check (LDPC) code is specified by a 
pseudorandom parity check matrix H containing mostly O's and a small number of I's. A 
binary («, _/,k) LDPC code has block length « and a parity-check matrix H with exact j 
ones per column and k 1 's in each row, assuming 7 > 3 and k> j . Thus every code bit is 
checked by exactly j parity checks and every parity check involves k codeword bits. The 
typical minimum distance of these codes increases linearly with n for a fixed k and j . 
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Let us consider the LDPC code shown in Fig. 2 through its factor graph 
representation. Such a code is constructed from five parity check subcodes thus 
giving parity coding constraints S j ,0 < j < 4 . LDPC codes can be decoded a 

generalization of Pearl’s Belief Propagation algorithm [14][15]. Component subcodes 
i.e. parity constraints behave like communication sockets for the set of involved 
binary learners WL. issuing the observed binary predictions h. on binary target 

concepts c, , 0< i<9 . It should be noted that the BP algorithm is concerned with 
computation of conditional probabilities over cycle-free graphs and factor graph 
models are far away from being cycle-free. However, iterative decoding algorithms 
on Tanner graphs with cycles based on the BP algorithm work almost optimally. 
Therefore, they may also perform well for our RECOC_LDPC learning algorithm. 




RECOC_LDPC Algorithm 
Input 

(n, j, k) LDPC by the parity check matrix Hand G 
Quantizer 2^ with k bits per input in { 1 ,..., M } 

Training Sample TS with , Distribution D on TS 

Binary Weak Learner WL , Iterations I for BP 
Processing 

RECOC {TS , Qm , G , WLq , . . , WL„_i , , . . . , p^_^) ) 

Output 

h^{x)= Q^UbP (H ,-K, I, , Po ) 

The RECOC procedure performs the ECOC encoding stage but with additional 
computation of training error responses p, , 0 < i < « — 1 . In the prediction stage, this 
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set will be used in the BP decoding with I of iterative decoding steps. Then, a set of 
k informative binary target concepts will be estimated. Finally, a final hypothesis 

will be obtained by application of inverse quantization process . 



4 Experimental Results for RECOC_LDPC learning 

We have tested the RECOC_LDCP algorithm on various UCI learning domains. 
Learning algorithms were cbveloped using public domain Java WEKA library. 
Therefore, AdaBoost (AB), Decision Stump ^ (DS) and C4.5"* (denoted by C4 for the 
multiclass case) implementations details can be fully determined from WEKA 
documentation. For LDPC code creation and the BP propagation algorithm, we 
developed a library extension of WEKA [16] software using public domain D. J. 
MacKay software [15]. We adopted {n,A,k) LDPC output encoding schemes with 

n _ [log^M] ^ = the channel rate in number of information bits 

(informative binary target concepts) per codeword length n . Let us denote by 
RECOC_LDCP(R) + DS, the test error rate for RECOC_LDPC learning at channel rate 
R using DS learners. Also, let us denote by RECOC_LDCP(R) + AB(T) + DS the test 
error for RECOC_LDPC learning at channel rate R with T inner binary AdaBoost 
boosting steps on DS learners. Test error responses for learning domains. Anneal, 
Primary Tumor, Lymph and Audiology against a maximum number of iterations I 
used in the BP algorithm are shown from Fig. 3 to Fig. 6. Test error responses without 
channel encoding i.e. a bare transmission of the quantization scheme on the M 

valued output domain are shown for I =0. 




Fig. 3. Anneal. RECOC_LDPC allows Fig. 4. Primaiy Tumor. RECOC_LDPC 
test error below C4. allows test error below C4 



^ Decision tree with only one node 

R. Quinlan’s decision tree algorithm in its C4.5 Revision 8 public version 
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Fig. 5. Lymph. RECOC_LDPC with Fig. 6. Audiology. RECOC_LDPC cannot 
inner boosting allows test error below learn below C4. 
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The observed results show that aside from C4.5 benchmark comparisons, in all 
cases the desired boosting effect is achieved with respect to the base learning model 
(DS learners). The observed results are consistent with the typical BP recursive 
decoding performance for LDPC codes, i.e. better performance for increasing number 
of iterations steps and decreasing channel rates (not shown). In addition, performance 
results suggest that for cases where RECOC_LDPC has an error floor (Audiology), 
the problem might be on the underlying code itself i.e. a representation problem might 
be causing such undesirable behavior. 



5 Conclusions and Further Work 

The main contribution of this work has been the development a general decoding 
framework for the design of low-complexity error adaptive learning algorithms. From 
this decoding view of the learning from examples problem, a channel is constructed in 
the training phase and it is later used in a prediction stage. Learning algorithms in this 
model play a decoding function by means of APP decoding methods under the 
assumption that a Recursive Error Correcting Output enCoding (RECOC) expansion 
has taken place at a transmitter or teacher side. This framework embodies the standard 
AdaBoost algorithm as an instance of threshold decoding for simple repetition codes. 
RECOC learning models can be described by coding related graphical models. In 
addition, RECOC predictions can be explained by general message-passage schemes 
on their underlying graphical models. This is an appealing description of learning 
algorithms, which allows an intuitive but powerful design approach. A number of 
directions for further work and research stand out. Besides its validation with real 
data, it remains to study convergence characteristics, the effect of channel model 
assumptions and alternative binary learning schemes. Good alternatives for further 
research are Gallager's algorithms A and B [6] and a simplified implementation [17] 
of the BP algorithm. 
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Abstract. Regression trees are models developed to deal with multiple 
regression data analysis problems. These models fit constants to a set 
of axes-parallel partitions of the input space defined by the predictor 
variables. These partitions are described by a hierarchy of logical tests 
on the input variables of the problem. Several authors have remarked that 
the preference criteria used to select these tests have a clear preference 
for what is known as end-cut splits. These splits lead to branches with 
a few training cases, which is usually considered as counter-intuitive by 
the domain experts. In this paper we describe an empirical study of the 
effect of this end-cut preference on a large set of regression domains. The 
results of this study, carried out for the particular case of least squares 
regression trees, contradict the prior belief that these type of tests should 
be avoided. As a consequence of these results, we present a new method 
to handle these tests that we have empirically shown to have better 
predictive accuracy than the alternatives that are usually considered in 
tree-based models. 



1 Introduction 

Regression trees [6] handle multivariate regression methods obtaining models 
that have proven to be quite interpretable and with competitive predictive ac- 
curacy. Moreover, these models can be obtained with a computational efficiency 
that hardly has parallel in competitive approaches, turning these models into 
a good choice for a large variety of data mining problems where these features 
play a major role. 

Regression trees are usually obtained using a least squares error criterion that 
guarantees certain mathematical simplifications [8, Sec. 3.2] that further enhance 
the computational efficiency of these models. This growth criterion assumes the 
use of averages in tree leaves, and can be seen as trying to find partitions that 
have minimal variance (i.e. squared error with respect to the average target 
value). The main drawback of this type of trees is the fact that the presence of 
a few outliers may distort both the average as well as having a strong influence 
in the choice of the best splits for the tree nodes. In effect, as we will see in this 
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paper, the presence of outliers^ may lead to the choice of split tests that have a 
very small sub-set of cases in one of the branches. Although these splits are the 
best according to the least squares error criterion they are counter-intuitive to the 
user and as we will see may even degrade predictive performance on unseen data. 
Users find it hard to understand that trees have top level nodes with branches 
that are very specific. Most users expect that top level nodes “discriminate” 
among the most relevant groups of observations {e.g. the observations with very 
high value of the target variable and the others). 

The work presented in this paper addresses the problem of allowing this type 
of splits in regression trees, which is known as the end-cut preference problem 
[2, p. 313-317]. We study this type of splits and their effect on both predictive 
accuracy and interpretability of the models. We compare this to the alternative 
of avoiding this type of splits in the line of what was proposed by Breiman and 
colleagues [2]. Our extensive experimental comparison over 63 different regression 
problems shows that the differences in terms of predictive accuracy of both 
alternatives are quite often statistically significant. However, the overall number 
of significant differences does not show a clear winner, which contradicts prior 
belief on the effect of end-cut preference in tree-based regression models. 

In this paper we propose an alternative method that allows end-cut preference 
only in lower levels of the trees. The motivation behind this method is to avoid 
these splits in top level nodes, which is counter-intuitive for the users, but at the 
same time use them in lower levels as a means to avoid their negative impact in 
the accuracy of trees using least squares error criteria. Our experimental com- 
parisons show a clear advantage of this method in terms of predictive accuracy 
when compared to the two alternatives mentioned before. 

In the next section we present a brief description of least squares regression 
trees methodology and of the end-cut preference problem. Section 3 presents an 
experimental comparison between the alternatives of allowing and not allowing 
end-cut splits. In Section 4 we describe our proposed approach to handle the 
end-cut preference problem, and present the results of comparing it to the other 
alternatives. Finally, in Section 5 we provide a deeper discussion of the study 
carried out in this paper. 



2 Least Squares Regression Trees 

A regression tree can be seen as a kind of additive regression model [4] of the 
form, 

i 

rt (x) = ki X I {x G Di) (1) 

i=l 

where k[s are constants; I (.) is an indicator function returning 1 if its argument 
is true and 0 otherwise; and D[s are disjoint partitions of the training data D 

i i 

such that [J Di = D and fj = (/). 

i-1 

^ These may be “real” outliers or noisy observations. 
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These models are sometimes called piecewise constant regression models. 
Regression trees are constructed using a recursive partitioning (RP) algorithm. 
This algorithm builds a tree by recursively splitting the training sample into 
smaller subsets. The algorithm has three key issues: 

— A way to select a split test (the splitting rule) . 

— A rule to determine when a tree node is terminal. 

— A rule for assigning a model to each terminal node (leaf nodes). 

Assuming the minimization of the least squares error it can be easily proven 
{e.g. [8]) that if one wants to use constant models in the leaves of the trees, the 
constant to use in each terminal node should be the average target variable of 
the cases falling in each leaf. Thus the error in a tree node can be defined as, 

Err {t) = {yi - ytf (2) 

where Dt is the set of rit training samples falling in node t; and y^ is the average 
target variable (T) value of these cases. 

The error of a regression tree can be defined as. 

Err (T) = ^ P (1) x Prr (1) = ^ ^ x ^ ^ {y, - yif = ^ ^ ^ (l/i - Vif 

IGT IGT IGT 

( 3 ) 

where T is the set of leaves of tree T; and P{1) is the probability of a case falling 
in leaf I (which is estimated with the proportion of training cases falling in the 
leaf). 

During tree growth, a split test s, divides the cases in node t into a set 
of partitions. The decrease in error of the tree resulting from this split can be 
measured by. 



AErr (s, t) = Err it) — — x Err (ti) (4) 

n 

I 

where Err {ti) is the error on the subset of cases of branch i of the split test s. 

The use of this formula to evaluate each candidate split would involve several 
passes through the training data with the consequent computational costs when 
handling problems with a large number of variables and training cases. This 
would be particularly serious, in the case of continuous variables that are known 
to be the major computational bottleneck of growing tree-based models [3]. 
Fortunately, the use of the least squares error criterion, and the use of averages 
in the leaves, allow for further simplifications of the formulas described above. 
In effect, as proven in [8], for the usual setup of binary trees where each node 
has only two sub-branches, t^ and tR, the best split test for a node is the test s 
that maximizes the expression. 
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ntL 



Si 

ntn 



( 5 ) 



where Sl = Y. Vi and Sr= Y Vi- 

This expression means that one can find the best split for a continuous vari- 
able with just a single pass through the data, not being necessary to calculate 
averages and sums of squared differences to these averages. One should stress 
that this expression could only be derived due to the use of the least squares 
error criterion and of the use of averages in the leaves of the trees^. 

Breiman and colleagues [2] mention that since the work by Morgan and 
Messenger [5] it is known that the use of the least squares criteria tends to favor 
end-cut splits, i.e. splits in which one of the branches has a proportion of cases 
near to zero^. 

To better illustrate this problem we describe an example of this end-cut 
preference occurring in one of the data sets we will use our experiments, the 
Machine'^ domain. In this data set the best split test according to the error 
criterion of Equation 5 for the root node of the tree, is the test MM AX < 48000. 
This split divides the 209 training cases in two sub-branches, one having only 
4 observations. This is a clear example of a end-cut split. Figure 1 helps to 
understand why this is the best split according to the criterion of Equation 5. 

As it can be seen in Figure 1, there are 4 observations (upper right part of 
the figure) that have end-cut values in the variable MMAX, and at the same 
time outlier values in the target variable. These are the two characteristics that 
when appearing together lead to end-cut splits. Within this context, a candidate 
split that “isolates” these cases in a single branch is extremely valuable in terms 
of the least squares error criterion of Equation 5. 

Allowing splits like MMAX < 48000 in the example above, may lead to 
trees that seem quite ad-hoc to users that have a minimal understanding of the 
domain, because they tend to expect that top level nodes show highly general 
relations and not very specific features of the domain. This is reinforced by the 
fact that on most large data sets, trees do tend to be too deep for a user to grasp 
all details, meaning that most users will only be able to capture top-level splits. 
As such, although no extensive experimental comparisons have been carried out 
till now^, it has been taken for granted that end-cut splits are undesirable, and 
most existing tree-based systems {e.g. CART [2], THAID [5] or C4.5 [7]) have 
some mechanism for avoiding them. However, if the drawbacks in terms of user 
expectations are irrefutable, as we will see in Section 3 the drawbacks of end-cut 
splits in terms of predictive accuracy are not so clear at all in the case of least 
squares regression trees. 



^ In [8] a similar expression was developed for the least absolnte deviation criterion 
with medians on the leaves of the trees. 

^ For a formal proof of end-cut preference see [2, p. 313-317]. 

^ Available for instance in http://www.liacc.up.pt/~ltorgo/Regression/DataSets.html. 
® To the best of our knowledge. 
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MMAX 



Fig. 1. An example of a end-cut preference problem. 



3 An Experimental Analysis of the End-Cut Preference 

In this section we carry out an experimental study of the consequences of end-cut 
splits. Namely, we compare the hypothesis of allowing this type of splits, and 
the alternative of using some form of control to avoid them. 

In this experimental comparison we have used 63 different regression data 
sets. Their main characteristics (number of training cases, number of continuous 
variables, and number of nominal variables) are shown in Table 1. 

Regarding the experimental methodology we have carried out 10 repetitions 
of 10-fold cross validation experiments, in the light of the recent findings by 
Bradford and Brodley [1] on the effect of instance-space partitions. Significance 
of observed differences were asserted through paired t-tests with 95 and 99 con- 
fidence levels. 

The first set of experiments we report compares the following two types of 
least squares regression trees®. The first tree has no control over end-cut splits, 
thus allowing them at any stage of the tree growth procedure as long as they are 
better according to the criterion of Equation 5. The second type of trees does 
not allow splits^ that lead to branches that have less cases then a minimum value 



® Both are implemented in system RT (http://www.liacc.up.pt/~ltorgo/RT/), and 
they only differ in the way they handle end-cut splits. All other features are the 
same. 

^ Both on continuous as well as nominal variables. 
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Table 1. The Used Data Sets. 



Data Set 


Characteristics 


Data Set 


Characteristics 


Abalone (Ab) 


4177; 7; 1 


Elevators (El) 


8752; 40; 0 


Delta Elevators (DE) 


9517; 6; 0 


Ailerons (Ai) 


7154; 40; 0 


Kinematics (Ki) 


8192; 8; 0 


Telecomm (Te) 


15000; 26; 0 


Computer A (CA) 


8192; 22; 0 


Computers (CS) 


8192; 12; 0 


Algal 1 (Al) 


200; 8; 3 


Algal2 (A2) 


200; 8; 3 


AlgalS (A3) 


200; 8; 3 


Algal4 (A4) 


200; 8; 3 


AlgalS (A5) 


200; 8; 3 


Algal6 (A6) 


200; 8; 3 


Algal? (A7) 


200; 8; 3 


Anastesia (An) 


80; 6; 0 


Auto-Mpg (AM) 


398; 4; 3 


Auto-Price (AP) 


159; 14; 1 


BankSFM (B8) 


4500; 8; 0 


Bank32NH (B32) 


4500; 32; 0 


CloseNikkei (CN) 


2000; 49; 1 


CloseDow (CD) 


2399; 49; 1 


Chlorophyll (Chi) 


72;4;1 


House8L (H8) 


22784; 8; 0 


House 16H (H16) 


22784; 16; 0 


Diabetes (Di) 


43; 2; 0 


Pyrimidines (Py) 


74; 24; 0 


Triazines 


186; 60; 0 


FacultyDSOOl (Fa) 


197; 33; 0 


Employment (Em) 


368; 18; 0 


ArtificialD2 (D2) 


40768; 2; 0 


Industry (In) 


1555; 15; 0 


Friedman Example (Fr) 


40768; 10; 0 


Housing (Ho) 


506; 13; 0 


Machine CPU (Ma) 


209; 6; 0 


Marketing (Mkt) 


944; 1; 3 


Artificial MV (MV) 


25000; 7; 3 


Puma8NH (P8) 


4500; 8; 0 


Puma32NM (P32) 


4500; 32; 0 


Servo 


167; 0; 4 


IWiscoinBreastCancer (WBC) 194; 32; 0 


ICaliforniaHousing (CH) 20460; 8; 0 I 


Additive (Ad) 


30000; 10; 0 


IKM (IKM) 


710; 14; 3 


Acceleration (Ac) 


1732; 11; 3 


C02-emission (C02) 


1558; 19; 8 


CW Drag (CW) 


1449; 12; 2 


Available Power (AP) 


1802; 7; 8 


Driving Noise (DN) 


795; 22; 12 


Fuel Town (FTw) 


1764; 25; 12 


Fuel Total (FTo) 


1766; 25; 12 


Fuel Country (FC) 


1764; 25; 12 


Maximal Torque (MT) 


1802; 19; 13 


Top Speed (TS) 


1799; 17; 7 


Maintenance Interval (MI) 


1724; 6; 7 


Heat (He) 


7400; 8; 4 


Steering Acceleration (SAc) 


63500; 22; 1 


Steering Angle (SAn) 


63500; 22; 1 


Steering Velocity (SV) 


63500; 22; 1 


Fluid Discharge (FD) 


530; 26; 6 


Fluid Swirl (FS) 


530; 26; 6 


China (Ch) 


217; 9; 0 


Delta Ailerons (DA) 


7129; 5; 0 


1 II 
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established by the user. In our experiments we have set this minimum value to 
10 cases. 

Table 2 shows the results of the comparison between the two alternatives in 
terms of Normalized Mean Squared Error (NMSE) . Columns two and three show 
the data sets where we observed a statistically significant win of some method 
at different confidence levels, and the fourth column shows the cases where the 
observed differences were not statistically significant. 



Table 2. End-Cut versus No End-Cut in terms of NMSE. 





99% 


95% 


Not significant 


End-Cut 

Wins 


20 

Ad,AP,B32,Chl,C02,CW,D2, 

FS,FTo,FTw,He,Ma,MI,MT, 

An,Se,SAc,SAn,SV,TS 


2 

FC,FD 


15 

Ac,Al,A2,A3,A5,A6,A7, 

Ch,CN,Fa,DN,Ho,Te,AP,WBC 


No End-Cut 
Wins 


17 

lKM,Ab,Ai,B8,CH,CD,CA, 

CS,DA,DE,El,Fr,H16,H8,Ki 

P32,P8 


2 

In,Py 


7 

A4,AP,Di,Mkt,MV,Em,Tr 



The first thing to remark is that there is a statistically significant difference 
between the two approaches on 41 of the 63 data sets. This reinforces the im- 
portance of the question of how to handle end-cut splits. However, contrary to 
our prior expectations based on previous works {e.g [2]), we didn’t observe a 
clear advantage of not using end-cut splits®. On the contrary, there is a slight 
advantage of the alternative allowing the use of end-cut splits at any stage of 
tree growth (the average NMSE over all data sets of this alternative is 0.4080, 
while not allowing end-cut splits leads to an average NMSE of 0.4140). The 
main conclusion to draw from these results is that they provide strong empirical 
evidence towards the need of a re-evaluation of the position regarding end-cut 
splits in the context of least squares regression trees. 

Why should end-cut splits be beneficial in terms of predictive error? We 
believe the best answer to this question is related to the statistic used to measure 
the error. Least squares regression trees revolve around the use of averages and 
squared differences to these averages (c./. Section 2). The use of averages as 
a statistic of centrality for a set of cases is known to suffer from the presence 
of outliers. By not allowing the use of end-cut splits that tend to isolate these 
outliers in a separate branch (c./. Figure 1), every node will “suffer” the influence 
of these extreme values (if they exist). This will distort the averages, which may 
easily lead to larger errors as the predictions of the trees are obtained using the 
averages in the leaves. Going back to the example described in Section 2 with 
the Machine data set, if one does not allow the use of end-cut splits, instead of 



Still, we must say that the method proposed in [2] for controlling these splits is 
slightly different from the one use d in our experiments. 
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immediately isolating the four outlier cases shown in Figure 1, they will end-up 
falling in a leaf that includes 10 other observations. This leaf has an average 
target variable value of 553.64, which will be the prediction of the tree for every 
test case falling in this leaf. However, 10 out of the 14 observations in this leaf, 
have a target value in the range [208. .510]. Thus the distribution in this leaf is 
clearly being skewed by the outliers and this provides an idea of the risk of using 
this leaf to make predictions. The same conclusion can be reached by looking at 
the Mean Squared Errors® at the leaves of both trees. While the tree using end- 
cut splits has an average MSE over all leaves of 5132.2, the tree without end-cut 
splits has and average of 15206.4, again highlighting the effect of these outliers, 
that clearly increase the variance in the nodes. One should remark that this does 
not mean that the tree using end-cut splits is overfitting the data, as both trees 
went through the same post-pruning process that is supposed to eliminate this 
risk (moreover the experimental comparisons that were carried out show that 
this is not occurring, at least in a consistent way). 

In resume, although clearly going against the intuition of users towards the 
generality of the tests in the trees, end-cut splits provide some accuracy gains in 
several data sets. This means that simply eliminating them can be dangerous if 
one is using least squares error criteria. We should stress that the same conclu- 
sions may not be valid if other error criteria were to be used such as least absolute 
deviations, or even the criteria used in classification trees, as these criteria do 
not suffer such effects of outliers. 

As a consequence of the experimental results reported in Table 2 we propose 
a new form of dealing with end-cut splits that tries to fulfill the interpretability 
expectations of users that go against the use of end-cut splits, while not ignoring 
the advantages of these splits in terms of predictive accuracy. This new method 
is described in the next section. 



4 A Compromising Proposal for Handling End-Cut 
Preference 

The main idea behind the method we propose to deal with end-cut preference is 
the following. End-cut splits should not be allowed in top level nodes of the trees 
as they handle very specific (poorly represented in the training sample) areas of 
the regression input space, thus going against the interpretability requirements 
of most users. As such, our method will use mechanisms to avoid these splits in 
top level nodes of the trees, while allowing them in bottom nodes as a means to 
avoid the distorting effects that outliers have in the averages in the leaves. 

In order to achieve these goals we propose a simple method consisting of not 
allowing end-cut splits unless the number of cases in the node drops below a 
certain user-definable threshold^®. Moreover, as in the experiments of Section 3, 
a test is considered an end-cut split if one of its resulting branches has less than 

® Larger values of MSE indicate that the values are more spread around the average. 
In the experiments we will report we have set this threshold to 100 cases. 
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a certain number of cases^^. This means that nodes with a number of training 
cases between the first and second of these thresholds are allowed to consider 
end-cut splits. These are the bottom nodes of the trees^^. 

Our hypothesis is that with this simple method we will obtain trees that 
are acceptable in terms of interpret ability from the user perspective, but at the 
same time will outperform both alternatives considered in Section 3 in terms 
of predictive accuracy. With the purpose of testing this hypothesis we have 
carried out an experiment similar to the one reported in Section 3, but now 
comparing our proposed method with the two alternatives of allowing end-cut 
splits everywhere, and not allowing them at all. The results of comparing our 
method to the former alternative are shown in Table 3. 



Table 3. Our method compared to allowing end-cut splits in terms of NMSE. 





99% 


95% 


Not significant 




16 


0 


22 


Our Method 


Ab,Ad,Ai,CH,CD,CA,D2,DA, 




1KM,A2,A3,A5,AP,B8,CS 


Wins 


DE,El,H16,H8,In,Ki,Pu8,SV 




DN,FD,FTo,FTw,FC,He,MI 

Mkt,Em,Pu32,SAc,SAn,TS,Tr 


End- Cut 


2 


0 


23 

Ac,A6,A7,AM,B32,Chl,Ch, 


Wins 


Al,Fa 




CN,C02,CW,Di,FS,Fr,Ho,Ma, 

MT,MV,Te,AP,Py,An,WBC,Se 



This comparison clearly shows that there is no particular advantage in al- 
lowing end-cut splits everywhere, when compared to our proposal (with two 
single exceptions). Moreover, our proposal ensures that this type of splits will 
not appear in top level nodes of the trees^^, which fulfills the user’s expectations 
in terms of interpretability of the models. In effect, going back to the Machine 
example, with our proposal we would not have a root node isolating the four 
outliers (as with the alternative of allowing end-cut splits), but they would still 
be isolated in lower levels of the tree^"*. 

What this comparison also shows is that our proposal can outperform the 
method of allowing end-cut splits in several data sets. It is interesting to observe 
that most of these 16 cases are included in the set of 17 significant losses of the 
alternative allowing end-cut splits shown in Table 2. 

We have used the value of 10 for this threshold. 

Unless the training sample is less than 100 cases, which is not the case in all but 
four of our 63 benchmark data sets {c.f. Table 1). 

Unless the data set is very small. 

Namely, the root node would consist of the split MM AX < 28000, which divides the 
209 training cases in 182 and 27, respectively, and then the 27 cases (that include 
the 4 outliers) would be split with the end-cut test MM AX < 48000 ( c.f. Figure 1 
to understand what is being done with these splits). 
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The results of comparing our method to the alternative of not allowing end- 
cut splits are shown in Table 4. 



Table 4. Our method compared to not allowing end-cut splits in terms of NMSE. 





99% 


95% 


Not signihcant 




22 


2 


13 


Our Method 


Ad,AP,B32,Chl,CD,C02,CW,D2, 




Ac,A2,A3,A5,A6,A7, 


Wins 


FD,FS,FTo,FTw,He,Ma,MI,MT, 

An,Se,SAc,SAn,SV,TS 


FC,H8 


DE,DN,H16,In,Mkt,AP,Tr 




10 


4 


12 


No End-Cnt 


lKM,Ai,B8,CH,CA, 




Ab,A4,AM,Ch,CN,DA, 


Wins 


CS,El,Fr,Ki,Pu32 


Al,Fa,Pu8,Py 


Di,Ho,MV,Em,Te,WBC 



Once again we observe a clear advantage of our proposal (24 significant wins), 
although there are still 14 data sets where not allowing end-cut splits seems to 
be preferable. However, comparing to the alternative of always allowing end-cut 
splits, which has clear disadvantages from the user interpret ability perspective, 
our method clearly recovers some of the significant losses {c.f. Table 2). It is 
also interesting to remark that with the single exception of the H8 data set, all 
22 wins of the strategy using end-cut splits over the no-end-cuts approach, are 
included in the 24 wins of our proposal. This means that our method fulfills our 
objective of being able to take advantage of the gains in accuracy entailed by 
the use of end-cut splits, in spite of not using them in top levels of the trees. 

In spite of the advantages of our proposal, there are also some drawbacks 
that should be considered. Namely, there is a tendency for producing larger 
trees (in terms of number of leaves) than with the other two alternatives that 
were considered in this study. This is reflected in the results shown in Table 5, 
that presents the comparison in terms of number of leaves of our proposal with 
the other two alternatives. 



Table 5. Tree size comparison of our method with the other two alternatives. 





No End-Cnt Splits 


All End-Cut Splits | 




99% 


95% 


Not significant 


99% 


95% 


Not significant 


Onr Wins 


23 


2 


3 


1 


0 


15 


Onr Losses 


31 


0 


5 


20 


2 


25 



These results seem to contradict our goal of a method that produces trees 
more interpretable to the user than the trees obtained when allowing end-cut 
splits. Interpretability is known to be a quite subjective issue. Still, we claim that 
in spite of having a larger number of leaves (c.f. Table 5), the trees obtained with 
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our method are more comprehensible. As we have mentioned before, in most real- 
world large data sets, the trees obtained by this type of systems are too large 
for any user to be able to grasp all details. As such, we argue that only top-level 
nodes are in effect “understood” by the user. As our method does not allow 
end-cut splits in these top level nodes, we claim that this leads to trees that are 
more comprehensible to the user. 

5 Discussion 

The results of our empirical study on the effects of end-cut preference within 
least squares regression trees, lead us to the conclusion that there is no clear 
winner among the two standard alternatives of allowing or not allowing the use 
of end-cut splits. These results are somehow surprising given the usual position 
regarding the use of these splits. However, our study confirms the impact of the 
method used to handle these splits on the predictive accuracy of least squares 
regression trees. Our analysis of the reasons for the observed results indicates 
that this study should not be generalized over other types of trees (namely 
classification trees). 

The method we have proposed to handle end-cut splits is based on the anal- 
ysis of the requirements of users in terms of interpretability, and also on the 
results of our empirical study. By allowing end-cut splits only in lower levels of 
the trees, we have shown that it is possible to outperform the other two alter- 
natives considered in the study in terms of predictive accuracy. Moreover, this 
method avoids end-cut splits in top level nodes which goes in favor of user expec- 
tations in terms of comprehensibility of the trees. However, our results also show 
that there is still some space for improvements in terms of predictive accuracy 
when compared to the alternative of not allowing end-cut splits. Future work, 
should be concentrated in trying to find not so ad-hoc methods of controlling 
these splits so as to avoid some of the still existing significant losses in terms of 
predictive accuracy. Moreover, the bad results in terms of tree size should also 
be considered for future improvements of our proposal. 

6 Conclusions 

We have described an empirical study of the effect of end-cut preference in the 
context of least squares regression trees. End-cut splits have always been seen 
as something to avoid in tree-based models. The main conclusion of our ex- 
perimental study is that this assumption should be reconsidered if one wants 
to maximize the predictive accuracy of least squares regression trees. Our re- 
sults show that allowing end-cut splits leads to statistically significant gains in 
predictive accuracy on 22 out of our 63 benchmark data sets. In spite of the 
disadvantages of end-cut splits in terms of the interpretability of the trees from 
the user perspective, these experimental results should not be disregarded. 

We have described a new form of dealing with end-cut splits that tries to take 
into account our empirical observations. The simple method we have described 
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shows clear and statistically significant improvements in terms of predictive ac- 
curacy. Still, we have also observed that there is space for further improvements. 

Future work should try to improve the method we have described, and also 
to carry out similar studies for tree-based models using different error criteria. 
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Abstract. Most of the existing data mining approaches to time series prediction 
use as training data an emhed of the most recent values of the time series, 
following the traditional linear auto-regressive methodologies. However, in 
many time series prediction tasks the alternative approach that uses derivative 
features constructed from the raw data with the help of domain theories can 
produce significant prediction accuracy improvements. This is particularly 
noticeable when the available data includes multivariate information although 
the aim is still the prediction of one particular time series. This latter situation 
occurs frequently in financial time series prediction. This paper presents a 
method of feature construction based on domain knowledge that uses 
multivariate time series information. We show that this method improves the 
accuracy of next-day stock quotes prediction when compared with the 
traditional embed of historical values extracted from the original data. 



1 Introduction 

Recently, several data mining techniques have been applied with success to time 
series prediction based on samples of historical data (e.g. [1], [5], [20]). Most of these 
approaches use supervised machine learning techniques and a data preparation stage 
that produces a set of examples in a two-dimension, tabular, “standard form” [28]. 
Several approaches can be used to prepare this kind of tabular representation from the 
time series raw data. Choosing the most appropriate set of features for the time series 
problem in hand can have a significant impact on the overall accuracy of the 
prediction models. This is the main problem addressed in this paper, for the particular 
case of financial time series prediction. 



1.1 Traditional Temporal Embed 

The simplest and most common procedure of transforming the original time series 
data to produce a set of examples in tabular form is to use the last known values of the 
time series as features describing each example, and using the next value of the time 
series as the respective target value of the example. This auto-regressive data 
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preparation technique is usually called “time-delay embedding” [22], “tapped delay 
line” or “delay space embedding” [17]. 

To illustrate the basic temporal embedding, let us consider an univariate time series 

X(t) = ..., Xt-2, X,.i, X,, X,+i, X,+2, ... 

from which the values up to x, (that is, ..., x,„ x) are known. Let us further 

consider as our objective the construction of a model to predict the next value of the 
same time series (x,^,), using some supervised data mining process. Assuming that this 
depends at most on the k previous values, traditional time embed approaches will 
describe each example using k features, each taking one of the k previous 
measurements of the time series (x,, x,j, x,^, ..., Each complete example will 

thus have the form 

X,, X,.y, X,.2, .... Xt.(l.l), X,+i 

where x,+] is the value of the target (dependent) variable. 

This kind of data preparation forms the basis of the classical autoregressive time 
series prediction methods like AR [29] or ARMA [6], and is theoretically justified by 
the Takens theorem [24]. This theorem states that, with some restrictions, a number of 
(2-N)+l past values is enough to reconstruct the model of a noiseless system with N 
dimensions. In the case of the described X(t) time series, assuming absence of noise, 
and considering the series to be generated by a A-dimensional system, the features x,, 
x,j, x, 2 , ..., x,^ 2 «j would be enough for the extraction of the system model and for 
predicting future values of the time series. 

1.2 Limitations of the Temporal Embed 

It should be remarked that many real-life systems suffer complex interactions with 
other systems and, even if they have an intrinsic linear behavior, they tend to generate 
time series that present severe problems to comply with the Takens theorem 
restrictions. In particular, this theorem does not seem to apply to most financial time 
series, namely, to stock exchange time series. 

In effect, stock exchange time series are generated by extremely complex systems 
that involve the interaction of thousands or millions of independent agents (the 
investors) each capable of changing its behavior over short time frames according to a 
virtually limitless number of possible individual “states”, resulting in a system with a 
number of dimensions that, in practical terms, must be considered infinite [14]. This 
fact implies that the number of historical values needed to construct each example in a 
way conforming to the Takens theorem conditions would be unrealistic (the necessary 
data would not be available and would be unmanageable by the machine learning 
algorithms). Moreover, the system global behavior is not static over time (for 
instance, the number and individual characteristics of the intervening agents are not 
constant). This non-stationary behavior of the system means that “old” time series 
data may not be truly representative of the current system (in fact, it was generated by 
a related but different system that no longer exists) and using them blindly to generate 
predictions for the current system can be misleading. 
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These characteristics of stock exchange time series result in the almost unanimous 
admittance that base data consisting solely of historical values of the time series being 
predicted do not contain enough information to explain more than a fraction of the 
variance in the time series. In fact, the Efficient Markets Theory [8], now somewhat 
discredited ([1], [11], [15]) but still accepted as basically correct by many economists, 
states that those data do not contain any information useful for the prediction of the 
future behavior of the system. 

The problems that impair the applicability of Takens Theorem to very complex 
dynamic systems do not seem to be reduced by the use of multivariate information as 
basis data for time series modeling. As an example, the original data we use in our 
experiments includes 7 values for each day and each stock. This corresponds to 7 time 
series that are available to be used as basic data for model construction. In this 
context, using all the 7 time series with an embed dimension of, say, 25 (obviously 
insufficient to include enough information to describe the system), would result in 
175 features describing each example. Even if those features were discretized to adopt 
a maximum of 5 different values, that would create a “representation space” with the 
capacity of distinguishing 175^ different cases. This kind of input dimension is 
obviously too large considering the number of training and testing examples available 
(we have 10 years of daily data corresponding to around 2400 records), and would 
result in a very sparsely populated space that would severely limit the efficiency of 
most machine learning algorithms [3], [23]. This problem is reinforced by the fact 
that, due to the system complexity and the lack of sufficient information on the base 
data, the system behaves as if each variable (dependent and independent) includes a 
strong noise component. Thus, since all the information present in the base data only 
explains a small proportion of the system variance and the information present in our 
175 variables will explain an even smaller proportion of that variance, we can expect 
each of those variables to have a very small individual relevance to the desired 
prediction. 



1.3 Alternatives to the Temporal Embed 

In situations where heavy overfitting problems can be expected due to the sparsely 
populated representation space and when noisy data are associated with a possibly 
large number of input variables with low Individual value to knowledge extraction, it 
is frequently possible to obtain better results by combining several of the original 
variables [16], [19]. The aim of developing such feature combinations is the creation 
of a reduced set of “derivative” variables having a greater discriminative power for 
the modeling task being considered. 

One of the possible approaches to the development of a reduced set of derivative 
variables that contain most of the useful information present in the original data is the 
use of an automated method that searches for some combination of the original 
variables. The most used of such methods is “Principal Component Analysis” [4]. 
This method develops orthogonal linear combinations of the original variables and 
ranks them on the basis of their ability to “explain” the target variable variance. The 
main problem with this approach is that the original variables are replaced by the 
most significant linear combinations and so, the data mining algorithms will no longer 
be able to search for non-linear combinations of the original variables. This approach 
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is therefore particularly appropriate for “data reduction” when only linear prediction 
methods will be used, and on problems related to systems known at start to be 
basically linear (clearly not the case of stock exchange time series [14], [31]). 

Another approach to feature construction consists in the “manual” development of 
a reduced set of derivative variables using domain knowledge. This approach is 
particularly efficient in domains where there is an available (but incomplete, thus not 
allowing a deterministic prediction) body of domain theory relating the data to the 
system mechanics and behavior. In stock exchange prediction there is no sure domain 
knowledge and, considering only historical stock exchange information as base data, 
the available domain theories (related to “technical analysis” of stocks [7], [18]) are 
particularly uncertain [10]. However, the large number of known technical analysis 
indicators' seems a good starting point to build derivative features. In effect, the 
highly uncertain applicability of these indicators to individual stocks, markets, and 
time frames, limits them as final global theories, but does not prevent their usefulness 
to construct input features for machine learning algorithms, since most of these 
algorithms can filter out the features that prove to be least useful. 



2 Data Pre-processing in Financial Time Series Prediction 

A review of the published works on financial time series prediction shows that, in 
spite of the limitations mentioned in Section 1.2, most works use either the basic 
version of temporal embed or very limited transformations of this technique. 

As examples of the use of direct temporal embed, we can mention the use of a 
direct univariate embed of dimension two to predict particularly strong daily stock 
exchange quotes variations [20], or the use of a multivariate basic embed of daily 
stock quotes information as input data to a genetic algorithm used to conduct a direct 
search for trading criteria in [30]. 

Regarding the use of basic transformations of temporal embed we can refer the 
approach followed by [13], who employs the difference of the logarithms of the last 
values of several time series to predict the stock quotes of several Japanese 
companies, or the work of [15] where logarithmic transformations of the differences 
of the last two and three known values of exchange rate times series are used to 
predict the variation direction of those time series. 

Interesting examples of more ambitious adaptations of the basic temporal embed 
can be found in two entries of the important time series prediction competition carried 
out in Santa Fe in 1992 [25]: Mozer tests several transformations involving different 
weights for the embedded values, in an univariate context related to the prediction of 
exchange rates between the US dollar and the Swiss franc [17] and, for the prediction 
of the same time series, Zang and Hutchinson try a univariate embed using non- 
consecutive past values for different prediction time frames [31]. 

Although not so frequent, it is also possible to find the use of more sophisticated 
variables derived from the base data. An early effort is the work described in [26], 



' Descriptions can be found in a variety of sources, like, for instance, http://traders.com, or 
http://www.tradersworld.com) 
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where a neural network with 61 inputs is used for the prediction of the next day value 
of the US dollar / German mark exchange rate. Forty five of the 61 variables used in 
this work correspond to an embed of the times series being modeled, while the other 
16 variables are developed from multivariate data that the authors believe to be 
relevant for the prediction of the time series. The work presented in [27] is another 
example involving more sophisticated derivative variables and more complete 
multivariate base data. In this latter study, the objective is also the prediction of the 
US dollar / German mark exchange rate. The authors use 69 input variables for the 
neural network that produces the predictions, but none of those variables are the result 
of a direct embed of the time series being predicted. In fact, 12 of the 69 variables are 
“built” based on past values of the time series, using typical “technical analysis” 
transformations, and the other 57 variables reflect multivariate fundamental 
information exterior to the time series being predicted (using data associated to 
exchange rates between other currencies, interest rates, etc.). In these two works, the 
large number of variables “fed” to the machine learning algorithms will lead to 
overfitting problems and, in effect, both works include as main objectives the 
presentation of new techniques to reduce overfitting problems in neural networks. 



3 A System for Stock Quotes Prediction 

When the goal is the short-term (up to a week) prediction of stock quotes, the 
relevance of fundamental information (both micro or macro-economic) tends to be 
small. In effect, even if this kind of information proves useful to predict a part of the 
long-term variability of the stocks, the proportion of that ability with direct reflection 
on the variance of the next few days would be very small [11], [18]. Thus, it seems 
reasonable to believe that most of the short term variability of stock values is due to 
fluctuations related to the behavior of the investors, which technical analysis claims 
that can be inferred from past variations of the stock quotes and volumes. However, 
there are still important problems to address, if an approach based on derived 
variables built with the help of domain knowledge is to be tried. These problems are 
related to the uncertain nature of the existing technical analysis “indicators”, and to 
the vast number of existing indicators (the direct use of a large number of derivative 
variables of this kind could lead to overfitting problems similar to those related to the 
use of a large direct embed [10]). 

An approach based on derived variables would benefit from a practical way of 
representing domain knowledge (in the form of technical analysis indicators) and also 
from an efficient way of generating variables from that representation. Also, given the 
large quantity of possible derived “technical variables” it may be advantageous to 
perform some kind of automatic filtering of the less relevant features. To try to 
circumvent the problems associated with this approach, we have developed an 
integrated prediction system that includes a knowledge representation language that 
allows the direct description of most technical analysis indicators using pre-defined 
language elements. Based on these descriptions the system is able to automatically 
generate from the raw data the features described by the user. This general set-up 
could be used with any machine learning algorithm that copes well with noisy data. 
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We chose to test this framework with a fe-nearest neighbor algorithm [5], and with a 
regression version of the PA3 rule induction algorithm [2]. 

The A:-NN algorithm uses a distance metric that involves a ranking of feature 
importance, while the rule induction algorithm does not need such ranking. However, 
due to the domain characteristics mentioned before, this system also gains from a 
reduction of the number of features. As such, our prediction system includes a feature 
ranking and selection step. The complete system architecture is shown in Figure 1 . 

The domain knowledge representation language allows a domain expert to define 
technical analysis indicators using a simple but flexible representation. As examples, 
consider the following instructions of this language: 
percent(clo(i),clo(i+l)); i=0..3 
ratio( moa( vol, 3, 0 ), moa( vol,15,0)) 

The first of these instructions describes 4 features, each corresponding to the 
percent variation between the last four daily closing values of a stock and the 
respective previous closing value. The second instruction describes a single feature 
corresponding to the ratio between the moving average of the last 3 daily trading 
volumes of the stock and the moving average of the last 15 daily trading volumes^. 




Fig. 1. The complete time-series prediction system 



The features defined with the language are automatically generated from the raw 
data and appended with the respective target value for each generated example, to 
form a set of examples in the tabular “normal form”. 

The resulting features can have very different discrete or continuous values. 
Discrete features can have ordered integer values (for instance, consider a feature that 
counts the number of times the closing value is higher than the opening value during 
the last 20 trading days), but can also have non-ordered values (e.g. a feature that 
represents the weekday of the last trading session). Continuous features can also have 
very different ranges of values. Given these different characteristics a feature 



^ The third parameter in the moa constructor specifies the number of days between the last 
value used in the moving average and the present reference example. This way, the 
expression ratio(moa(vol,3,0),moa(vol,3,l)) relates two 3-day moving averages of the 
volumes (the first using the last 3 values and the second using the first 3 of the last 4 values). 
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discretization module is run before the set of examples is “fed” to the data mining 
algorithms. This module discretizes each feature into 5 values using a traditional 
approach [9] that is known to behave well over noisy data. This discretization 
technique works individually over each feature. It starts by analyzing the examples of 
the training set and then divides the range of values of each feature into five intervals 
with the same number of values in each interval. Each of these intervals is represented 
by an integer value ranging from 1 to 5, and the examples are discretized into one of 
these 5 discrete values accordingly. This simple approach to feature discretization 
produced robust results over our data. As some algorithms are able to work with non- 
discretized data, we have checked whether by using this discretization step some 
information was being lost. We have not noticed any significant accuracy differences 
in these tests and thus we have decided to work only with the discretized data. 

After generating the training examples, our prediction system starts the core data 
mining step with a feature ranking and selection module. The feature selection 
procedure aims to reduce the overfitting problems related to the difficult domain 
characteristics mentioned in Section 1.2. In effect, some algorithms that use 
representation languages with larger descriptive power, are able to fit almost perfectly 
the hypersurface defined by the training data. This is the case of our rule induction PA 
algorithm. These methods would have a tendency to find existing but non-meaningful 
statistic fluctuations on the training data if given enough irrelevant features. To 
prevent this kind of overfitting, we include a feature ranking and filtering step in our 
system. This step uses a combination of the following feature relevance measures: the 
information gain; the Pearson’s r correlation between each feature and the target 
values; and a powerful feature relevance metric that evaluates the features in the 
context of the other features [12]. The first two of these metrics look to each feature 
individually, while the third considers feature interactions. These three measures are 
combined to form an overall ranking, using a direct averaging of the three 
independent feature rankings. Regarding the selection of features we simply retain the 
highest ranking features up to a certain number^. 

The A:-NN algorithm we have used in our prediction system is inspired in 
Bontempi’s work [5]. It uses an orthogonal “Manhattan” distance metric that weights 
differently the features and a linear kernel function that gives greater weights to the 
training examples found to be nearest to the test example being predicted. After 
several tests over stock exchange time series"*, we opted to use a fixed neighborhood 
of 150 cases. This considerable number of neighbors seems to result in a good balance 
between bias and variance and has an important effect in the reduction of overfitting 
problems typically associated to the very noisy examples sets expected in stock 
exchange time series prediction. 

The rule induction algorithm that was also used in our prediction system is a 
regression variant of a general-propose sequential-cover algorithm called PA3, which 
handles two-class problems. This algorithm was developed to be efficient over noisy 
data, and performed well when compared with other rule induction algorithms over 
several noisy data sets (including stock exchange time series prediction) [2]. PA3 



^ In this work we have used 10 features in all tests. 

"* This parameter tuning was carried out using only the sub-set of stock data that was used for 
training purposes according to the partitioning that will be described in Section 4. 
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induces an ordered list of “if. ..then...” rules. Each rule has the form “if <complex> 
then predict <class>”, where <complex> is a conjunct of feature tests, the “selectors”. 
In PAS, each selector implies testing a feature to see if its value is included in a 
specified range of values. The postcondition of a PAS rule is a single Boolean value 
that specifies the class that the rule predicts. To produce a regression (numeric) 
prediction instead of a class prediction, we have used the very simple approach of 
assigning as postcondition of each rule the mean value of the training examples 
covered by the rule. This variant of the original PAS algorithm was also adapted to 
accept (and to use in the rule evaluation) numeric values instead of class labels in the 
training examples. 

Those two algorithms where chosen to be integrated in our prediction system not 
only because they tend to be efficient over noisy data, but also because they use 
totally different algorithmic approaches. This is an important issue since we wanted to 
test the validity of our different data pre-processing methods. In effect, it is 
conceivable that some data preparation approaches lead to examples sets better suited 
for specific machine learning algorithms, but less efficient when used by others. 



4 Experimental Testing 

The main goal of this paper is to show that the use of domain knowledge to generate 
derivative features from the raw data leads to better predictive accuracy results than 
the traditional embed approach, within financial time series prediction problems. To 
test this hypothesis, we have carried out a set of comparative experiments. In our 
experiments we have used time series data concerning five of the more actively traded 
companies listed in the Portuguese BVLP stock exchange: “BCP”, “Brisa”, “Cimpor”, 
“EDP” and “PT”. The base data consists of 855 daily records for each of the five 
companies (from 25 November 1997 to 11 May 2001). Each of those daily records 
includes 7 base variables: the date of the day; the closing value for the BVLP30 stock 
index; the volume of traded stocks; and the opening, maximum, minimum and closing 
values of the stock. 

Regarding the methodology used to compare the different alternatives considered 
in this work we have used the following strategy. The available data for each 
company was split in four sections. The first 50 records were kept aside for the 
construction of the first processed example (thus allowing the use of embed 
dimensions of up to 50). The next 400 records were used to generate 400 training 
examples according to the three strategies that will be compared in this paper. The 
following 400 records were used to construct 400 testing examples. The last 5 records 
where kept aside to allow the use of target values up to five days ahead in the 
examples. 

With respect to the method used to obtain predictions for the 400 test cases we 
have used the following strategy. Initially all learning algorithms were given access to 
the first 400 examples. With this set of examples a prediction is made for the first test 
case. After this prediction is obtained, this test example is added to the set of training 
cases leading to a new training set, now with 401 examples, which is used to obtain a 
new model. This iterative train-Htest process (sometimes known as sliding window) 
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continues until a prediction is obtained for all 400 test examples. This means that for 
instance the prediction for the last test example is obtained with a model that was 
learned using 799 training cases (the original 400 training examples, plus the first 399 
test examples). 

Given the goal of our comparative experiments, we have used a prediction horizon 
of one single day. Regarding the target variable (the times series to predict) we have 
chosen an average of the next day quotes of each stock. This average consists of the 
mean of the opening, maximum, minimum and closing values of the stock during the 
following day. We called this average the “reference value” of the stock. More 
precisely, we predict the percentage change between the last known reference value 
and the next day reference value, 

where Ref{t ) = ^*^(0 + ^'”(0 + 

The reason for the use of this “reference value” instead of using, for example, the 
closing value of each day, is related to the nearly stochastic nature of these time 
series. In effect, this kind of time series behaves as if it had an important component 
of added white noise. Thus, every single observation of the time series is affected by a 
random amount of noise, which tends to “drown” the relatively small variations of the 
stock values attributable to the system “real” behavior ([8], [10]). The use of this 
reference value results in two main advantages when compared with the use of a 
single daily point value of the stock quotes (for instance the closing value). The first, 
and most important, is the reduction in the proportion of the variance that results from 
random noise with relation to the variance resulting from the subjacent system 
behavior. The second advantage is related to the usefulness of the predicted values for 
trading. To illustrate this latter advantage, let us suppose that we predict a 1% rise in 
the reference value for the next day. Based on that prediction, we could place a sell 
order with a (minimum) sell value equal to the predicted reference value. In this 
situation, we can attain the sell value during the next day even if the predicted rise for 
the reference value falls short, since, in a typical day (one with variability in the stock 
value) the maximum value for the day must be higher than the reference value. 

We have used our prediction system to generate and compare three alternative 
ways of generating sets of examples for each of the 5 stocks considered in this paper. 
In the first approach, that will be labeled as “Data Set 1” in the tables of results, we 
have used a direct embed of the time series we are predicting. Namely, we developed 
25 features that represent the percent variations between the last 25 consecutive pairs 
of values of the reference values time series (this implies using information associated 
to the last known 26 values of each reference values time series). The second 
approach compared in our study (that we will refer to as “Data Set 2”) uses a similar 
direct embed strategy but now considering several of the time series available for each 
stock. In this alternative we have also used 25 features, which include the 5 last 
known percent changes of the reference values time series, and the 4 last known 
percent changes of the volume, opening, maximum, minimum and closing values time 
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series^. Finally, in the third alternative data preprocessing approach, that we will be 
labeled as “Data Set 3”, we developed 25 features based on domain knowledge. This 
knowledge was used both to generate and to select the final set of features. Since 
there are literally hundreds of technical indicators applicable to our base data®, the 
development of this type of features requires some restricting choices. We have used 
a process to choose the final set of features where some of them were fixed from the 
start, while others went through an automatic feature selection process. Among the 
former we included the first 4 terms of the simple embed used in the first data 
preprocessing approach. Regarding the automatic selection of the other features, we 
used the knowledge representation language to describe and generate a set of features 
that seemed relevant, and then used a feature ranking process’ to discard those with 
lowest evaluations. In order to increase the statistical significance of this selection 
process we have used the combined 2000 training examples (400 training examples X 
5 stocks). 

It is interesting to analyze which were the features that resulted from the process 
described above. It was rather surprising to find out that most of the retained features 
where not the “full” technical analysis indicators but “constructive elements” of those 
indicators. An example is the retention of simple ratios of moving averages, instead of 
more complex variants of the Moving Average Convergence Divergence (MACD) 
indicator that where also considered as candidate features. 

The final set of 25 chosen features can be grouped as follows: 

• One feature based on the date time series, stating the weekday of the example. 

• Six features based on the reference values time series, four of them embeds of 
the differences, the other two relating moving averages of different lengths. 

• Three features based on direct embeds of differences of the BVL30 time series. 

• Two features based on the differential changes of the reference values and 
BVL30 time series, during two different time frames. 

• Three features based on the volume of trade time series, one of them an embed 
of the differences, the other two relating moving averages of different lengths. 

• Five features based on different relations of the opening, maximum, minimum, 
closing and reference values of the last known day. 

• Five features also based on relations of the opening, maximum, minimum, 
closing and reference values, but using moving averages of different lengths. 

• 

Overall, among the 25 developed features, the oldest historical values used are 
reference values up to 20 days old and traded volumes up to 15 days old. 

In order to evaluate the prediction performance of the three alternative ways of pre- 
processing the raw stock data we used two common prediction error measures [10], 



® With the help of our domain knowledge representation language it is quite simple to represent 
these two first sets of features. The first is represented by the single expression 
percent(ref(i),ref(i+l)) with i=0..24. The second is represented by 6 very similar 
expressions, one for each of the 6 base data time series involved. 

® Many of them involving a choice of parameters and thus greatly increasing the number of 
possible variants. 

’ This process is similar to the one used in the data mining component of our system described 
in Section 3. 
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chosen for their relevance for trading criteria evaluation. The first is the percent 
accuracy of the binary (rise or fall) predictions for the next day, defined as, 

I 

%Acc = — — ^5’core(y,.,y,.) (2) 

^ test i=l 



where, , is the number of test cases; y. is the truth target value of test case i; is 
the predicted value for that test case; and 
1 ify>0Ay^0 
Score{y,y) = ■ 1 ify<0Ay<0 
0 otherwise 



The second evaluation measure we have used is the average value of the return on 
the test examples, taken as positive when the prediction sign is correct and as negative 
when it is wrong*. We called this measure the Average Return on the Predicted 
Examples (ARPE), which can be defined as. 



1 A',„, 

ARPE = — — ^ \Refi - Refi_i \ sign{yi y,- ) (3) 

^ test i=l 



where, is the number of test cases; y,. is the truth target value of case i; y, is the 
predicted value for that case; Ref^ is the reference value of case i. 



Tables 1 and 2 show the accuracy and ARPE results obtained in our experiments 
using the Lazy3 and PA6 algorithms over the three data sets generated with the 
alternative pre-processing methods, which were described earlier in this Section. 
These tables also include the results obtained with two traditional benchmarks: the 
Naive Prediction of Returns (NPR); and the buy-and-hold strategy®. The NPR merely 
predicts that the next percentage change of the reference value (the return of this 
value), will be the same as the last known change (i.e. y(f + l)= yW)- The results for 
this benchmark are presented in Table 1 as they are directly comparable with the other 
scores shown on this table. The results for the buy-and-hold benchmark are show in 
Table 2 as the average gain of the buy-and-hold strategy'® over the 400 trading days 
involved in our tests, so that they are directly comparable with the other ARPE 
measure values presented. 

The results shown on these tables provide strong empirical evidence towards our 
hypothesis concerning the use of domain knowledge to generate the training 
examples. In effect, considering the results over the 5 stocks, we can observe that the 
models obtained with data set 3 (that contain the features obtained with domain 
knowledge), consistently achieve better accuracy results than the others, 
independently of the algorithm used to obtain the model. When comparing the results 



* Notice that the target values of each example are the percent changes between the last known 
reference value and the reference value of the next day. 

® An extended discussion of these benchmarks can be found in [ 10 ] and [ 21 ]. 

This strategy consists of buying at the price of last known training case, and selling 

at the price of the last known reference value ^6/50.^004400- 
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between data sets 1 and 2, we also observe worse results with the simple embeds 
(with a single exception in the BCP stock using the PA6 algorithm), than with embeds 
of multiple time series. If we consider the ARPE evaluation measure the conclusions 
are similar. An interesting point to notice is the fact that all the algorithm / data sets 
combinations produce positive results (accuracy results greater than 50% and ARPE 
results greater than 0). 



Table 1. Percent accuracy over the test examples 





Lazy3 algorithm 


PA6 algorithm 


NRP 


Data Set 1 


Data Set 2 


Data Set 3 


Data Set 1 


Data Set 2 


Data Set 3 


BCP 


57.50 


62.75 


69.00 


56.50 


56.50 


62.75 


65.50 


Brisa 


54.75 


62.00 


66.50 


57.75 


63.25 


63.75 


56.75 


Cimpor 


59.75 


63.00 


69.25 


56.00 


57.00 


64.50 


58.50 


EDP 


58.50 


65.75 


72.25 


58.00 


67.75 


72.50 


59.00 


PT 


57.50 


61.75 


68.50 


57.50 


63.50 


67.75 


57.50 


Average 


57.60 


63.05 


69.10 


57.15 


61.60 


66.25 


59.45 



Table 2. Average Return on the Predicted Examples (ARPE) 





Lazy3 algorithm 


PA6 algorithm 


Buy-and- 

hold 


Data Set 1 


Data Set 2 


Data Set 3 


Data Set 1 


Data Set 2 


Data Set 3 


BCP 


0.134 


0.253 


0.286 


0.193 


0.215 


0.215 


0.002 


Brisa 


0.176 


0.286 


0.394 


0.228 


0.312 


0.333 


0.099 


Cimpor 


0.266 


0.404 


0.498 


0.192 


0.386 


0.447 


0.141 


EDP 


0.263 


0.515 


0.578 


0.302 


0.531 


0.591 


0.010 


PT 


0.494 


0.762 


1.045 


0.482 


0.922 


1.007 


0.113 


Average 


0.267 


0.444 


0.560 


0.279 


0.473 


0.519 


0.073 



Analyzing the benchmark results in isolation, we notice that the NPR accuracy 
results are considerably above the 50% average, which is somehow unexpected. This 
is due to the high auto-correlation (at lag 1) of the five time series in analysis (which 
also helps to explain the prediction ability of data set 1, as used in our system)". 
Comparing our results with those achieved hy the benchmarks, we notice that all the 
algorithm / data sets combinations achieve considerably better ARPE values than the 
buy-and-hold benchmark. On the other hand, the NPR accuracy results are globally 
higher than the results of the simple embed used in data set 1 , although they are worse 
than the results of data set 2 and, particularly, of data set 3. 

In order to assert the statistical significance of these results, we have carried out 
one-sided paired t tests over the 400 test examples of each stock, using the accuracy 
scores. The accuracy differences between models obtained with data set 1 and data set 
3 were observed to be highly significant. In effect, with the Lazy 3 algorithm all the 
differences are statistically significant with a 99.5% confidence level, except for BCP 
where the level was 97.5%. Regarding the PA6 algorithm, the improvements obtained 



" An analysis of those time series showed no significant auto-correlation at other lags, limiting 
the potential effectiveness of more powerful prediction techniques based in auto-correlation. 
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with data set 3 are somewhat less marked, but still significant with 99.5% confidence 
for Cimpor, EDP, and PT, with 97.5% confidence for BCP, and with 95% for Brisa. 



5 Conclusions 

This paper described an approach to financial time series prediction whose main 
distinctive feature is the use of domain knowledge to generate the training cases that 
form the basis of model construction. With this purpose we have developed a domain 
knowledge representation language that allows the time series expert to easily express 
his knowledge. The description obtained with the help of this language is then used by 
our prediction system to generate a training sample from the raw times series data. 

We have contrasted this knowledge intensive approach with the traditional method 
of embedding the last time series values. With this purpose we have carried out a 
series of comparative experiments using time series data from five actively traded 
Portuguese companies. The results of our experiments with these stocks provide 
strong empirical evidence that a data preprocessing approach based on a direct embed 
of the time series has large limitations, and that the use of features constructed from 
the available raw data with the help of domain knowledge is advantageous. 

Regarding future developments of this research we intend to extend the 
experimental evaluation to prediction horizons larger than the next day, as well as 
further refine the important feature selection phase. 



References 

1. Abu-Mostafa, Y., LeBaron, B., Lo, A. and Weigend, A. (eds.): Proceedings of the Sixth 
International Conference on Computational Finance, CF99. MIT Press (1999) 

2. Almeida, P. and Bento, C.: Sequential Cover Rule Induction with PA3. Proceedings of the 
10th International Conference on Computing and Information (ICCF2000), Kuwait. 
Springer- Verlag (2001) 

3. Bellman, R.; Adaptative Control Processes: A Guided Tour. Princeton University Press, 
(1961) 

4. Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press (1995) 

5. Bontempi, G.: Local Learning Techniques for Modeling, Prediction and Control. Ph.D. 
Dissertation, Universite Libre de Bruxelles, Belgium (1999) 

6. Box, G., and lenkins, G.: Time Series Analysis, Forecasting and Control. Holden-Day 
(1976) 

7. Demark, T.: The New Science of Technical Analysis. John Wiley & Sons (1994) 

8. Fama, E.: Efficient Capital Markets: A review of Theory and Empirical Work. Journal of 
Finance, 25 (1970) 383-417 

9. Hall, M.: Correlation-Based Feature Selection for Machine Learning. Ph.D. Dissertation, 
Department of Computer Science, University of Waikato (1999) 

10. Hellstrom, T.: Data Snooping in the Stock Market. Theory of Stochastic Process 5(21) 
(1999) 

11. Herbst, A.: Analyzing and Forecasting Futures Prices. John Wiley & Sons (1992) 




The Use of Domain Knowledge in Feature Construction 1 29 



12. Flong, S.: Use of Contextual Information for Feature Ranking and Discretization. IEEE 
Transactions on Knowledge and Data Engineering, 9(5) (1997) 718-730 

13. Hutchinson, J.: A Radial Basis Eunction Approach to Einancial Time Series Analysis. 
Ph.D. Dissertation, Department of Electrical Engineering and Computer Science, 
Massachusetts Institute of Technology (1994) 

14. Kustrin, D.: Forecasting Financial Time series with Correlation Matrix Memories for 
Tactical Asset Allocation. Ph.D. Dissertation, Department of Computer Science, University 
of York, UK (1998) 

15. Lawrence, S., Tsoi, A. and Giles C.: Noisy Time Series Prediction using Symbolic 
Representation and Recurrent Neural Network Grammatical Inference. Technical Report 
UMIACS-TR-96-27 and CS-TR-3625, Institute for Advanced Computer Studies, 
University of Maryland, MD (1996) 

16. Michalski, R.: A Theory and Methodology of Inductive Learning. In Michalski, R., 
Carbonell, J., and Mitchell, T., (eds): Machine Learning: An Artificial Intelligence 
Approach, Vol. 1. Morgan Kaufmann (1983) 

17. Mozer, M.: Neural Net Architectures for Temporal Sequence Processing. In: Weigend, A. 
and Gershenfeld, N. (eds.): Time Series Prediction: Forecasting the Future and 
Understanding the Past. Addison-Wesley (1994) 

18. Murphy, J.: Technical Analysis of the Financial Markets: A Comprehensive Guide to 
Trading Methods and Applications. Prentice Hall (1999) 

19. Murthy, S., Kasif, S. and Salzberg, S.: A System for Induction of Oblique Decision Trees. 
Journal of Artificial Intelligence Research, 2 (1994) 1-32 

20. Povinelli, R.: Time Series Data Mining: Identifying Temporal Patterns for Characterization 
and Prediction of Time Series Events. Ph.D. Dissertation, Marquette University, 
Milwaukee, Wisconsin (1999) 

21. Refenes, A.: Testing Strategies and Metrics. In Refenes, A. (ed.): Neural Networks in the 
Capital Markets. John Wiley & Sons (1995) 

22. Sauer, T., Yorke, J. and Casdagli, M.: Embedology. Journal of Statistical Physics 65 (1991) 
579-616 

23. Scott, D.; Multivariate Density Estimation. John Wiley & Sons (1992) 

24. Takens, F.: Detecting Strange Attractors in Turbulence. In Rand, D. and Young, L. (eds.). 
Lecture Notes in Mathematics, Vol. 898. Springer (1981) 366-381 

25. Weigend, A. and Gershenfeld, N. (eds.): Time Series Prediction: Forecasting the Future and 
Understanding the Past. Addison-Wesley (1994) 

26. Weigend, A., Huberman, B. and Rumelhart, D.: Predicting Sunspots and Exchange Rates 
with Connectionist Networks. In Casdagli, M. and Eubank, S. (eds.): Nonlinear Modeling 
and Eorecasting, SEI Studies in the Sciences of Complexity. Addison-Wesley (1992) 

27. Weigend, A., Zimmermann, H. and Neuneier, R.: Cleaming. In Refenes, P., Abu-Mostafa, 
Y., Moody, J. and Weigend, A. (eds.): Neural Networks in Einancial Engineering 
(Proceedings of NNCM’95). World Scientific (1996) 

28. Weiss, S. and Indurkhya, N.: Predictive Data Mining: A Practical Guide. Morgan 
Kaufmann (1998) 

29. Yule, G.: On a Method of Investigating Periodicities in Disturbed Series with Special 
Reference to Wolfer's Sunspot Numbers. Phil. Trans. Royal Society, Series A, 226 (1927) 

30. Yuret, D. and Maza, M.: A Genetic Algorithm System for Predicting de OEX. Technical 
Analysis of Stocks and Commodities, 12(6) (1994) 255-259 

31. Zang, X. and Hutchinson, J.: Simple Architectures on Past Machines: Practical Issues in 
Nonlinear Time Series Prediction. In Weigend, A. and Gershenfeld, N. (eds.): Time Series 
Prediction: Porecasting the Puture and Understanding the Past. Addison-Wesley (1994) 




Optimizing the Sharpe Ratio 
for a Rank Based Trading System 



Thomas Hellstrom 



Department of Computing Science 
Umea University, 901 87 Umea, Sweden 
thomashScs . umu . se 
http : //www. cs .umu. se/-thomash 



Abstract. Most models for prediction of the stock market focus on 
individual securities. In this paper we introduce a rank measure that 
takes into account a large number of securities and grades them according 
to the relative returns. It turns out that this rank measure, besides being 
more related to a real trading situation, is more predictable than the 
individual returns. The ranks are predicted with perceptrons with a step 
function for generation of trading signals. A learning decision support 
system for stock picking based on the rank predictions is constructed. 
An algorithm that maximizes the Sharpe ratio for a simulated trader 
computes the optimal decision parameters for the trader. The trading 
simulation is executed in a general purpose trading simulator ASTA. 
The trading results from the Swedish stock market show signihcantly 
higher returns and also Sharpe ratios, relative the benchmark. 



1 Introduction 

The returns of individual securities are the primary targets in most research that 
deal with the predictability of financial markets. In this paper we focus on the 
observation that a real trading situation involves not only attempts to predict the 
individual returns for a set of interesting securities, but also a comparison and 
selection among the produced predictions. What an investor really wants to have 
is not a large number of predictions for individual returns, but rather a grading of 
the securities in question. Even if this can be achieved by grading the individual 
predictions of returns, it is not obvious that it will yield an optimal decision based 
on a limited amount of noisy data. In Section 2 we introduce a rank measure that 
takes into account a large number of securities and grades them according to the 
relative returns. The rank concept has in a previous study [5] shown a potential 
for good predictability. In Section 4, perceptron models for prediction of the 
rank are defined and historical data is used to estimate the parameters in the 
models. Results from time series predictions are presented. The predictions are 
used as a basis for a learning decision support system for stock picking described 
in Section 5. The surprisingly successful results are discussed. Section 6 contains 
a summary of the results together with ideas for future research. 



P. Brazdil and A. Jorge (Eds.): EPIA 2001, LNAI 2258, pp. 130-141, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 
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2 Defining a Rank Measnre 



The /c-day return Rk{t) for a stock m with close prices is defined 

for t G [A: + 1, ti] as 



Kit) 



y^{t) -y^{t-k) 
y'^it- k) 



( 1 ) 



We introduce a rank concept based on the /c-day return Rk as follows: 

The fc-day rank for a stock Sm in the set {si, s^} is computed by ranking 
the N stocks in the order of the fc-day returns Rk- The ranking orders are then 
normalized so the stock with the lowest Rk is ranked —0.5 and the stock with 
the highest Rk is ranked 0.5. The definition of the k-day rank A'^ for a stock m 
belonging to a set of stocks {si, ..., sn}, can thus be written as 



AT{t) 



#{Rl{t)\Rf{t) > Rlit), l<z<N}-l 
IV- 1 



- 0.5 



( 2 ) 



where the ^ function returns the number of elements in the argument set. This 
is as integer between 1 and N. i?™ is the fc-day returns computed for stock m. 
The scaling between —0.5 and +0.5 assigns the stock with the median value on 
Rk the rank 0. A positive rank A™ means that stock m performs better than 
this median stock, and a negative rank means that it performs worse. This new 
measure gives an indication of how each individual stock has developed relatively 
to the other stocks, viewed on a time scale set by the value of k. 

The scaling around zero is convenient when defining a prediction task for 
the rank. It is clear that an ability to identify, at time t, a stock m, for which 
A™(t + /i) > 0,/i > 0 means an opportunity to make profit in the same way 
as identifying a stock, for which Rh{t + h) > 0. A method that can identify 
stocks m and times t with a mean value of A™(t + h) > 0, h > 0, can be used 
as a trading strategy that can do better than the average stock. The hit rate 
for the predictions can be defined as the fraction of times, for which the sign 
of the predicted rank AJ^(t + h) is correct. A value greater than 50% means 
that true predictions have been achieved. The following advantages compared to 
predicting returns Rh{t + h) can be noticed: 

1. The benchmark for predictions of ranks A^(t + h) performance becomes 
clearly defined: 

— A hit rate > 50% , for the predictions of the sign of A^{t+h) means that 
we are doing better than chance. When predicting returns Rh(t + h), the 
general positive drift in the market causes more than 50% of the returns 
to be > 0, which means that it is hard to define a good benchmark. 

— A positive mean value for predicted positive ranks Ah{t + h) (and a 
negative mean value for predicted negative ranks) means that we are 
doing better than chance. When predicting returns Rh(t+h), the general 
positive drift in the market causes the returns to have a mean value > 0. 
Therefore, a mere positive mean return for predicted positive returns 
does not imply any useful predicting ability. 
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2. The rank values (t), for time t and a set of stocks 1, ...,N are 

uniformly distributed between —0.5 and 0.5 provided no return values are 
equal. Returns i?™, on the other hand, are distributed with sparsely popu- 
lated tails for the extreme low and high values. This makes the statistical 
analysis of rank predictions safer and easier than predictions of returns. 

3. The effect of global events gets automatically incorporated into the predic- 
tor variables. The analysis becomes totally focused on identifying deviations 
from the average stock, instead of trying to model the global economic sit- 
uation. 

3 Serial Correlation in the Ranks 

We start by looking at the serial correlation for the rank variables as defined 
in (2). In Table 1 mean ranks -|- 1) are tabulated as a function of A™{t) 
for 207 stocks from the Swedish stock market 1987-1997. Table 2 shows the “Up 
fraction" , i.e. the number of positive ranks Af^{t+ 1) divided by the number of 
non-zero ranks. Table 3 finally shows the number of observations of A™{t -1-1) 
in each table entry. Each row in the tables represents one particular value on 
k, covering the values 1,2,3,4,5,10,20,30,50,100. The label for each column 
is the mid-value of a symmetrical interval. For example, the column labeled 
0.05 includes points with fc-day rank A™{t) in the interval [ 0.00, ...,0.10 [. The 
intervals for the outermost columns are open-ended on one side. Note that the 
stock price time series normally have 5 samples per week, i.e. k = 5 represents 
one week of data and fc = 20 represents approximately one month. Example: 
There are 30548 observations where —0.40 < A'!f'{t) < —0.30 in the investigated 
data. In these observations, the 1-day ranks on the following day, A™{t + 1), 
have an average value of 0.017, and an "Up fraction" = 52.8%. 



Table 1. Mean 1-step ranks for 207 stocks 





k-day rank 


k 


-0.45 


-0.35 


-0.25 


-0.15 


-0.05 


0.05 


0.15 


0.25 


0.35 


0.45 


1 


0.067 


0.017 


-0.005 


-0.011 


-0.011 


-0.004 


-0.005 


-0.010 


-0.014 


-0.033 


2 


0.060 


0.017 


0.002 


-0.004 


-0.010 


-0.003 


-0.007 


-0.015 


-0.017 


-0.032 


3 


0.057 


0.016 


0.003 


-0.005 


-0.003 


-0.008 


-0.011 


-0.011 


-0.015 


-0.034 


4 


0.054 


0.018 


0.003 


-0.003 


-0.005 


-0.008 


-0.011 


-0.013 


-0.012 


-0.032 


5 


0.051 


0.015 


0.004 


-0.002 


-0.004 


-0.009 


-0.010 


-0.009 


-0.016 


-0.032 


10 


0.040 


0.013 


0.005 


-0.001 


-0.003 


-0.006 


-0.007 


-0.009 


-0.012 


-0.030 


20 


0.028 


0.008 


0.003 


-0.003 


-0.002 


-0.002 


-0.006 


-0.011 


-0.009 


-0.019 


30 


0.021 


0.007 


0.002 


0.004 


-0.003 


-0.003 


-0.006 


-0.006 


-0.011 


-0.015 


50 


0.014 


0.005 


0.000 


-0.000 


-0.001 


-0.002 


-0.005 


-0.004 


-0.006 


-0.010 


100 


0.007 


0.003 


0.001 


-0.002 


-0.003 


-0.004 


-0.004 


-0.004 


-0.003 


-0.008 
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Table 2. Fraction np/(up+down) moves (% ) 





k-day rank 


k 


-0.45 


-0.35 


-0.25 


-0.15 


-0.05 


0.05 


0.15 


0.25 


0.35 


0.45 


1 


59.4 


52.9 


49.1 


47.3 


48.0 


49.6 


49.5 


48.2 


47.8 


46.4 


2 


58.4 


52.8 


49.7 


48.9 


48.4 


49.7 


48.9 


47.4 


47.6 


46.2 


3 


58.1 


52.4 


50.3 


48.9 


49.1 


49.0 


48.1 


48.1 


47.8 


46.1 


4 


57.5 


52.5 


50.4 


49.2 


49.0 


48.7 


48.0 


47.9 


48.6 


46.3 


5 


57.1 


52.0 


50.4 


49.4 


49.1 


48.5 


48.2 


48.6 


47.7 


46.3 


10 


55.6 


51.7 


50.4 


49.8 


49.3 


48.8 


48.7 


48.5 


48.2 


46.3 


20 


53.8 


51.1 


50.2 


49.6 


49.4 


49.5 


49.0 


48.3 


48.7 


47.8 


30 


52.7 


50.9 


50.3 


50.8 


49.1 


49.2 


48.8 


48.9 


48.5 


48.4 


50 


52.0 


50.7 


49.6 


49.9 


49.6 


49.6 


49.0 


49.3 


49.1 


48.9 


100 


51.4 


50.4 


49.9 


49.5 


49.2 


49.2 


49.0 


49.2 


49.6 


49.1 



The only clear patterns that can be seen in the table are a slight negative 
serial correlation: negative ranks are followed by more positive ranks and vice 
versa. To investigate whether this observation reflects a fundamental property 
of the process generating the data, and not only idiosyncrasies in the data, the 
relation between current and future ranks is also presented in graphs, in which 
one curve represents one year. Figure 1 shows A™(t + 1) versus I.e.: 

1-day ranks on the following day versus 1-day ranks on the current day. The 
same relation for 100 simulated random-walk stocks is shown in Figure 2 for 
comparison. 

From Figure 1 we can conclude that the rank measure exhibits a mean re- 
verting behavior, where a strong negative rank in mean is followed by a positive 
rank. Furthermore, a positive rank on average is followed by a negative rank on 
the following day. Looking at the “Up fraction^’’ in Table 2, the uncertainty in 
these relations is still very high. A stock m with a rank A™(t) < —0.4 has a pos- 
itive rank A™(t -|- 1) the next day in no more than 59.4% of all cases. However, 
the general advantages described in the previous section, coupled with the shown 
correlation between present and future values, do make the rank variables very 
interesting for further investigations. In [3] the observed mean reverting behavior 
is exploited in a simple trading system. The rank measure in the next section is 
used both as input and output in a model for prediction of future ranks. 

4 Predicting the Ranks with Perceptrons 

For a stock m, we attempt to predict the /i-day-rank h days ahead by fitting a 
function so that 

+ h) = (3) 

where It is the information available at time t. It may, for example, include 
stock returns ranks A™(t), traded volume etc. The prediction problem 3 
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Table 3. Number of points 





k-day rank 


k 


-0.45 


-0.35 


-0.25 


-0.15 


-0.05 


0.05 


0.15 


0.25 


0.35 


0.45 


1 


30878 


30866 


31685 


30837 


30434 


31009 


31258 


30539 


30951 


31550 


2 


30926 


30548 


31427 


30481 


30442 


31116 


31263 


30435 


30841 


31675 


3 


30922 


30440 


31202 


30404 


30350 


31146 


31061 


30449 


30814 


31697 


4 


30887 


30315 


31052 


30320 


30371 


31097 


31097 


30328 


30777 


31776 


5 


30857 


30293 


30951 


30275 


30191 


31049 


31144 


30254 


30701 


31816 


10 


30755 


30004 


30648 


29958 


30004 


30875 


30889 


30155 


30571 


31775 


20 


30521 


29635 


30306 


29591 


29679 


30560 


30580 


29836 


30377 


31692 


30 


30388 


29371 


30083 


29388 


29567 


30349 


30437 


29652 


30190 


31503 


50 


30117 


29006 


29728 


28979 


29306 


29876 


30109 


29236 


29927 


31159 


100 


29166 


28050 


28790 


28011 


28238 


29015 


29049 


28254 


29012 


30460 



207 stocks 




Fig. 1. 1-day ranks A^{t -\- 1) versus A^{t). Each curve represents one year between 
1987 and 1997. 
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100 random walk stocks 




1-day rank 

Fig. 2. 1-day ranks A'^{t -|- 1) versus Each curve represents one year with 100 

simulated random- walk stocks. 



is as general as the corresponding problem for stock returns, and can of course 
be attacked in a variety of ways. Our choice in this first formulation of the 
problem assumes a dependence between the future rank (t + h) and current 
ranks for different values on k. I.e.: a stock’s tendency to be a winner in 

the future depends on its winner property in the past, computed for different 
time horizons. This assumption is inspired by the autocorrelation analysis in 
Hellstrom [5], and also by previous work by De Bondt, Thaler [1] and Hellstrom 
[3] showing how these dependencies can be exploited for prediction and trading. 
Confining our analysis to 1, 2, 5 and 20 days horizons, the prediction model 3 is 
refined to 

A^{t + h)=gUAT{t),A^{t),A^{t),A'^,{t)). (4) 

The choice of function could be a complex neural network or a simpler func- 
tion. Our first attempt is a perceptron, i.e. the model is 

A^{t + h) = . . 

+ pfA^it) + p'^A^it) + p^A^it) + p^A^^it)) ^ ’ 

where the activation function / for the time being is set to a linear function. 
The parameter vector p™ = is determined by regression on 

historical data. For a market with N stocks, N separate perceptrons are built, 
each one denoted by the index m. The /i-day rank zl™ for time t + h is predicted 
from the 1-day, 2-day, 5-day and 20-day ranks, computed at time t. To facilitate 
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further comparison of the m produced predictions, they are ranked in a similar 
way as in the definition of the ranks themselves: 

A^(t + h)< 0.5+ . s 

In this way the N predictions + h),m = get values uniformly 

distributed between —0.5 and 0.5 with the lowest prediction having the value 
—0.5 and the highest prediction having the value 0.5. 



4.1 Data and Experimental Set-Up 

The data that has been used in the study comes from 80 stocks on the Swedish 
stock market from January 1, 1989 till December 31, 1997. We have used a sliding 
window technique, where 1000 points are used for training and the following 
100 are used for prediction. The window is then moved 100 days ahead and the 
procedure is repeated until end of data. The sliding window technique is a better 
alternative than cross validation, since data at time t and at time t + k,k > 0 
is often correlated (consider for example the returns R™{t) and R^{t + 1)). 
In such a case, predicting a function value A^{ti + 1) using a model trained 
with data from time t > + is cheating and should obviously be avoided. The 
sliding window approach means that a prediction + h) is based on close 

prices y"^{t — k), Since 1000 points are needed for the modeling, the 

predictions are produced for the years 1993-1997. 



4.2 Evaluation of the Rank Predictions 

The computed models = 1,...,A^ at each time step t produce N pre- 
dictions of the future ranks + h) for the N stocks. The N predictions 

A™,m = are evenly distributed by transformation 6 in [— 0.5, ..., 0.5]. 

As we shall see in the following section, we can construct a successful trading 
system utilizing only a few of the N predictions. Furthermore, even viewed as 
N separate predictions, we have the freedom of rejecting predictions if they are 
not viewed as reliable or profitable^. By introducing a cut-off value 7, a selec- 
tion of predictions can be made. For example, 7 = 0.49 means that we are only 
considering predictions + h) such that + /i)| > 0.49. 

The results for 1-day predictions of 1-day ranks + 1) for a 7 = 0.0 
and 0.49 are presented in Tables 4 and 5. Each column in the tables represents 
performance for one trading year with the rightmost column showing the mean 
values for the entire time period. The rows in the table contain the following 
performance measures: 

^ As opposed to many other applications, where the performance has to be calculated 
as the average over the entire data set. 
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1. Hitrate+. The fraction of predictions A'^{t + h) > 7 , with correct sign. 
A value significantly higher than 50% means that we are able to identify 
higher-than-average performing stocks better than chance. 

2. Hitrate-. The fraction of predictions + h) < — 7 , with correct sign. 

A value significantly higher than 50% means that we are able to identify 
lower-than-average performing stocks better than chance. 

3. Return+. 100-Mean value of the /i-day returns R™{t + h) for predictions 

+ h) > 7 . 

4. Return,-. 100 -Mean value of the /i-day returns + h) for predictions 
A^{t + h) < - 7 . 

5. #Pred+. Number of predictions A]^{t + h) > j. 

6 . #Pred_. Number of predictions A™(t + h) < — 7 . 

7. #Pred. Total number of predictions + h). 

All presented values are average values over time t and over all involved stocks m. 
The performance for the one-day predictions are shown in the Tables 4 and 5. In 
Table 4 with 7 = 0.00, the hit rates Hitrate+ and Hitrate- are not significantly 
different from 50% and indicate low predictability. However, the difference be- 
tween the mean returns {Return^ and Return-) for positive and negative rank 
predictions shows that the sign of the rank prediction really separates the re- 
turns significantly. By increasing the value for the cut-off value 7 to 7 = 0.49, 
the hit rate goes up to 64.2.0% for predicted positive ranks (Table 5). Further- 
more, the difference between the mean returns for positive and negative rank 
predictions {Return+ and Return-) is substantial. Positive predictions of ranks 
are in average followed by a return of 0.895% while a negative rank prediction in 
average is followed by a return of 0.085%. The rows ^Pred+ and ^Pred- show 
the number of selected predictions, i.e. the ones greater than 7 and the ones 
less than —7 respectively. For 7 = 0.49 these numbers add to about 2.7% of the 
total number of predictions. This is normally considered insufficient when single 
securities are predicted, both on statistical grounds and for practical reasons 
(we want decision support more often than a few times per year) . But since the 
ranking approach produces a uniformly distributed set of predictions each day 
(in the example 80 predictions) there is always at least one selected prediction 
for each day, provided 7 < 0.5. Therefore, we can claim that we have a method 
by which, every day we can pick a stock that goes up more than the average 
stock the following day with probability 64%. This is by itself a very strong result 
compared to most published single-security predictions of stock returns (see for 
example Burgess and Refenes [2], Steurer [ 8 ] or Tsibouris and Zeidenberg [9]). 

5 Decision Support 

The rank predictions are used as basis for a decision support system for stock 
picking. The layout of the decision support system is shown in Figure 3. The 
1 -day predictions A^{t+ 1 ) are fed into a decision maker that generates buy and 
sell signals that are executed by the ASTA trading simulator. The decision maker 
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Sell ! 




► 
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ASIA 
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Fig. 3. The trading system is based on 1-day rank predictions for N stocks, and a deci- 
sion maker parameterized by the a:-vector. The learning element comprises a numerical 
optimizer, which finds the x that maximizes the Sharpe ratio for the historical data. 



Table 4. 1-day predictions of 1-day ranks \Ai{t + 1)| > 0.00 



Y ear : 


93 


94 


95 


96 


97 


93-97 


Hitrate^ 


51.1 


53.4 


53.3 


53.0 


52.5 


52.7 


Hitrate- 


51.8 


53.6 


53.4 


53.2 


52.6 


52.9 


Returrij^ 


0.389 


0.101 


0.155 


0.238 


0.172 


0.212 


Return- 


0.253 


- 0.176 


- 0.094 


0.057 


0.008 


0.010 


^Pred+ 


7719 


8321 


8313 


8923 


8160 


41510 


^Pred- 


7786 


8343 


8342 


8943 


8172 


41664 


^Pred 


15505 


16664 


16655 


17866 


16332 


83174 



Table 5. 1-day predictions of 1-day ranks \Ai{t + 1)| > 0.49 



Year : 


93 


94 


95 


96 


97 


93-97 


Hitrate^ 


59.7 


65.1 


67.9 


66.7 


61.2 


64.2 


Hitrate- 


52.7 


53.2 


56.4 


59.4 


56.7 


55.7 


Return^ 


1.468 


0.583 


0.888 


0.770 


0.745 


0.895 


Return- 


1.138 


- 0.236 


- 0.402 


- 0.040 


- 0.055 


0.085 


4j=Predj^ 


211 


215 


218 


228 


214 


1088 


^Pred- 


222 


220 


220 


234 


217 


1115 


^Pred 


15505 


16664 


16655 


17866 


16332 


83174 
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is controlled by a parameter vector x, comprising the threshold values of the step 
functions that generate buy and sell signals from the rank predictions. Two other 
parameters control the amount of money the trader is allowed to invest in one 
single trade, and how large a fraction of the total wealth should be kept in cash 
at all times. The learning element comprises an optimizing algorithm, which 
finds the parameter vector x that maximizes the mean annualized Sharpe ratio 
for the simulated trader. The learning period is 2 years and the found optimal 
X is then used to control the trading in the following year. The procedure is 
repeated yearly, using the last 2 years as learning period. The ASTA system is 
a general-purpose tool for development of trading and prediction algorithms. A 
technical overview of the system can be found in Hellstrom [4] and examples 
of usage in Hellstrom [3] and Hellstrom, Holmstrom [6]. More information can 
also be found at http://www.cs.umu.se/~thomash. The rank measure and also 
the prediction algorithm described in Section 4 is implemented in ASTA and 
therefore the test procedure is very straightforward. A transaction cost of 0.15% 
(minimum 90 Swedish crowns ~ USD) is assumed for every buy or sell order. 



5.1 Trading Results 

The annual trading profit is presented in Table 6. As can be seen, the per- 
formance is very good. The trading strategy outperforms the benchmark (the 
Swedish Generalindex) consistently and significantly every year and the mean 
annual profit made by the trading is 129.4%. The mean annual index increase 
during the same period is 27.4%. The Sharpe ratio which gives an estimate of 
a risk adjusted return shows the same pattern. The average Sharpe ratio for 
the trading strategy is 3.0 while trading the stock index Generalindex gives 1.6. 
By studying the annual performance we can conclude that these differences in 
performance is consistent for every year 1993-1997. Further more, the number 
of trades every year is consistently high (seven buy and seven sell per week), 
which increases the statistical credibility of the results. The trading results are 
also displayed in Figure 4. The upper diagram shows the equity curves for the 
trading strategy and for the benchmark index. The lower diagram shows the 
annual profits. 

Table 6. Trading results for the trading system shown in Figure 3. 
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Equity curves for Trading (6102%) and Index (222%) 
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Jan93 Jan94 Jan95 Jan96 Jan97 Jan98 



Mean annual profits Trading: 129% Index:27% 




Fig. 4. Performance for the simulated trading with stock picking based on 1-day rank 
predictions as shown in Figure 3. The top diagram shows the equity curves while the 
lower diagram displays the annual profits. The trading outperforms the benchmark 
index consistently every year. 



Let us look at possible reasons and mechanisms that may lie behind the good 
results. In [7], Lo and MacKinley report on positive cross-autocovariances across 
securities. These cross effects are most often positive in sign and are characterized 
by a lead-lag structure where returns for large-capitalization stocks tend to lead 
those of smaller stocks. Initial analysis of the trades that the rank strategy 
generates, expose a similar pattern, where most trading signals are generated for 
companies with relatively low traded volume. A positive cross-autocovariances 
can therefor provide part of an explanation to the successful trading results. 



6 Conclusions 

We have successfully implemented a model for prediction of a new rank measure 
for a set of stocks. The shown result is clearly a refutation of the Random Walk 
Hypothesis (RWH). Statistics for the 1-day predictions of ranks show that we 
are able to predict the sign of the threshold-selected rank consistently over the 
investigated 5-year-period of daily predictions. Furthermore, the mean returns 
that accompany the ranks show a consistent difference for positive and negative 
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predicted ranks which, besides refuting the RWH, indicates that the rank concept 
could be useful for portfolio selection. The shown experiment with an optimizing 
trading system shows that this is indeed the case. The mean annual profit is 
129.4% compared to 27.4% for the benchmark portfolio, over the investigated 5- 
year-period. The risk adjusted return, as measured by the Sharpe ratio, exhibits 
the same relation. The trading system gives a Sharpe ratio of 3.0 while trading 
the benchmark portfolio gives only 1.6. 

Of course, the general idea of predicting ranks instead of returns can be 
implemented in many other ways than the one presented in this paper. Replacing 
the perceptrons with multi layer neural networks and also adding other kind of 
input variables to the prediction model (4) are exciting topics for future research. 
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Abstract. Agents must decide, i.e., choose a preferred option from 
among a large set of alternatives, according to the precise context where 
they are immersed. Such a capability de"nes to what extent they are au- 
tonomous. But, there is no one way of deciding, and the classical mode of 
taking utility functions as the sole support is not adequate for situations 
constrained by qualitative features (such as wine selection, or putting 
together a football team). The BVG agent architecture relies on the use 
of values (multiple dimensions against which to evaluate a situation) to 
perform choice among a set of candidate goals. In this paper, we pro- 
pose that values can also be used to guide the adoption of new goals 
from other agents. We argue that agents should base their rationalities 
on choice rather than search. 



1 Introduction 

For more than a decade, agent programming was based upon the BDI (Belief- 
Desire-Intention) model [12], and backed by Bratman’s human practical reason- 
ing theory [6]. However, this framework is not completely clari"ed and extended 
for taking motivations and open situations into account. When looking deeply 
into the autonomy issue, we were faced with decision making at large, within 
non-deterministic contexts, because the BDI idea was not able to allow the im- 
plementation of a sound choice machinery. As a matter of fact, this issue is 
innovative from a cognitive science point of view, and our proposal based on 
values allows for a richer alternative to the so-called classical utility functions. 

From a cognitive modelling standpoint, we bring the issue of choice into 
the nucleus of the agent architecture, and provide the mechanisms for adapta- 
tion of the choice machinery. From a decision theory standpoint, we propose a 
choice model that expands on the notion of utility to tackle multiple dimensions, 
and dynamic adaptation. From a philosophy of mind standpoint, we argue that 
evaluation of the results of choice cannot be made in absolute terms (the perfect 
choice does not exist), and so rationality is individual, situated and multi- varied. 
From a social simulation standpoint, we provide mechanisms to regulate (and 
later observe) interactions, namely, rules for goal adoption based on curiosity 
and imitation. 

In [1, 3] , we have proposed a new model of rationality that goes beyond utility 
and encompasses the notion of value as a central component for decision-making 
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Table 1. Comparison between the utilitarian and the multiple- values approaches. 



Utility functions 


Choice functions 


one dimension 


several dimensions 


utility 


values 


closed world 


open world 


linear orderings 


quasi-orderings 


beforehand computation 


execution-time computation 



and computing importance. In this model, an agent has its cognitive machinery 
roughly included in a BDI framework, and produces its choices from available 
alternatives by considering, for the given situation, how each relevant value is 
a"ected. A calculus performs choices by collapsing all these partial assessments 
into a sorted list of actions to be performed. This must be made in execution time, 
since it is not possible to foretell all relevant possibilities beforehand. Evaluative 
assessments can evolve, as values are dynamic objects in the mind, and so can be 
added, removed, or changed. This framework does not add up to multi-attribute 
utility theory [10], anymore than it does to classical utility: the conditions for 
both these two choice theories are far too demanding, namely all alternatives 
must be comparable, and transitivity (or coherence) must be observed. 

All these operations are favoured by the contact of the agent with the envi- 
ronment, including other agents, or some user. If we want to achieve enhanced 
adaptability to a complex, dynamic environment, we should provide the agent 
with motivations, rather than plain commands. A degree of autonomy is re- 
quested, as well as responsibility and animation. Autonomy is a social notion, and 
concerns the in"uence from other agents (including the user). An autonomous 
agent should be allowed to refuse some order or suggestion from another agent, 
but it should be allowed to adopt it as well. 

In [3], the value-based adaptive calculus provided some interesting results 
when facing decision problems that called for some adaptability, even in the 
absence of all the relevant information. However, there remains one di" culty in 
assessing the results of those experiments: how do we evaluate our evaluations? 
We need meta-values with which to assess those results, but this calls for a 
designer, and amounts to look for emergent phenomena. It can be argued that if 
these ‘higher’ values exist why not to use them for decision? This dilemma shows 
clearly the ad hoc character of most solutions, and the di" culty in escaping it. 

We can conceive two ways out. The "rst is the development of an ontology of 
values, to be used in some class of situations. Higher or lower, values have their 
place in this ontology, and their relations are clearly de"ned. For a given problem 
the relevant values can be identi"ed and used, and appropriate experimental 
predictions postulated and tested. 

The second is the object of this paper. By situating the agent in an environ- 
ment with other agents, autonomy becomes a key ingredient, to be used with 
care and balance. The duality of value sets becomes a necessity, as agents cannot 




144 



Luis Antunes, Joao Faria, and Helder Coelho 



access values at the macro level, made judiciously coincide with the designer val- 
ues. The answer is the designer, and the problem is methodological. The BVG 
(Beliefs, Values and Goals) model update mechanism provides a way to put to 
test this liaison between agent and designer. And, it clari"es de"nitely one of 
the mysteries of the BDI model, usually translated in an ad hoc way when some 
application is implemented. 

In the next section we shed some light on the notion of choice, and argue 
for the introduction of motivational concepts di”erent from the ones usually 
considered in BDI (i.e. desires). Decision is about choice and preference, and not 
about plans and search. Next we introduce the BVG architecture, and its related 
mechanisms for decision. In the fourth section we sustain that the multiple values 
decision framework will produce agents with enhanced autonomy, and present 
some value-based modes for regulating agent interaction. Then, we deal with one 
of these modes, by presenting concrete mechanisms for goal adoption. Section 6 
discusses some forms of orderings we can use for the serialisation of candidate 
options. In section 7 and 8 we introduce our main case study, and discuss some 
of the experimental results we obtained. Finally we conclude by arguing that 
choice is one of the main ingredients for autonomy. 

2 Departing from BDI 

“Practical reasoning is a matter of weighing con'dcting considerations 
for and against competing options, where the relevant considerations are 
provided by what the agent desires/ values/cares about and what the 
agent believes.” [6, page 17] 

Bratman’s theory of human practical reasoning was the inspiring set of ideas 
of the most used model of agency till today, the BDI model, where the trade-o" 
is between two distinct processes: deciding what (deliberation) versus deciding 
how (means-ends reasoning). Yet, the aim of the overall mechanism is directed 
towards actions, agent autonomy is out of the argumentation. So, the decision- 
making activity is not faced and its choosing stage not fully understood. We are 
"rmly convinced that without discussing how “competing options” are ranged, 
ordered, and selected, autonomy is no longer possible. Our thesis is the following: 
any agent uses values to decide and "nally execute some action. So, we need 
a theory of human practical decision to help us design and implement such 
autonomous agents. 

Looking back to the "rst implementation of the BDI model, IRMA and PRS 
architectures (cf. respectively [7,13]), we found a simple control loop between 
the deliberation and the means-ends reasoning processes. Further enhancements 
focused the deliberation process, the commitment strategies, the intention recon- 
sideration [15], and the introduction of a belief revision function, a meta- level 
control component, or a plan generation function. 

However, if we take a mainstream account of decision making such as [14], we 
"nd four stages involved: information gathering, likelihood estimation, delibera- 
tion (pondering of the alternatives) and choosing. If we look at Wooldridge’s [15] 
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descriptions of both deliberation (deciding what to do) and means-ends reason- 
ing (deciding how to do it), we will discover that both are decision processes, and 
so can be encompassed by the above account. Studying the famous Cohen and 
Levesque’s equation, Intention = Choice -I- Commitment, we discovered that the 
choice part was no so much dissected when compared to the commitment one, 
and no good reasons were found. BDI is a logician’s dream that such processes 
can be ‘solved’ by theorem proving, or search, but those are not problems to be 
solved, rather decisions to be made. Or, broadly speaking, the issue at stake is 
not search, but rather choice. Search would imply the optimisation of a (one- 
dimensional) function, whereas in choice we can have a weighing over several 
dimensions (n-dimensional valuing). 

The whole decision process is a mix of quantitative and qualitative modes 
of reasoning: "rst, a list of sub-goals is built by deliberation, and without any 
order; second, a partial order is imposed by weighing and valuing con"icting 
considerations (e.g. feelings); third, the "rst sub-goals attached with values are 
“organised” unconsciously before the mind makes itself up. Recent "ndings of 
neuropsychology have come to the conclusion that many of our actions and 
perceptions are carried out by the unconscious part of our brains, and this means 
that emotion is no longer the sole control apparatus of decision-making [5]. 

The BVG (Beliefs, Values, Goals) model relaxes desires and intentions (gath- 
ered together into a general class of goals), puts out consciousness considerations, 
and only asserts that emotions are part of an overall global control machinery. 
We prefer, for simpli" cation reasons, that the agent is in control of its own mind 
(cf. [8]). Making these choices, we can study what a"ects the decision-making 
process (cf. "gure 1). 




Fig. 1. Decision-making with the BVG architecture. Rectangles represent processes, 
and ovals represent sets with the main mental concepts. Arrows depict the "ow of data 
through the decision process. 
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Our work has mostly addressed the deliberation (in the sense of pondering 
of alternatives, since we regard ‘BDI-deliberation’ as a decision process in itself) 
part of the decision process. However, there is something to be said about the 
choosing stage. Once a sorting of the alternatives achieved, choosing can be as 
simple as to pick the "rst of the list, but other heuristics can be used, such as 
using an adequate probability distribution to pick one of the n “rst options, or 
again using values to inform this new process. 

The agent’s character is closely associated with his ability to choose the 
best/most appropriate sub-goal, i.e., each agent (egotistic, laconic, deceiving, 
etc.) has a speci“c policy for choosing/preferring some goal. The so-called per- 
sonality of an agent means heuristics for re-ordering options for arranging con- 
"gurations (clusters) of preferences, in a n-dimensional space. 

The agent prefers an action by selecting credibility values (set-of-support) 
and by computing choice functions (policies), even amalgamating similar options 
(sub-goals) before a decision comes out. 

3 The BVG Architecture 

The BVG architecture roughly follows Castelfranchi’s principles for autonomy 
contained in his “Double Filter Architecture” [8]. The reference schema of the 
BVG architecture for decision-making includes goals, candidate actions to be 
chosen from, beliefs about states of the world, and values about several things, 
including desirability of those states. Values are dimensions along which situa- 
tions are evaluated, and actions selected. By dimension we mean a non empty 
set endowed with an order relation. 

In the model proposed in [1], emotions have a determinant role in the control 
of the choice process. One main role of emotions is to set the amount of time 
available for a decision to be made. We proposed a cumulative method, that 
improves the quality of the decision when time allows it. The idea is based 
on the di“erent importance of the other relevant values. Options are evaluated 
against the most important values "rst. 

The other key issue in the BVG architecture is the update of the choice 
machinery based on assessments of the consequences of the previous decisions. 
These assessments can be made in terms of (1) some measure of goodness (the 
quality of the decision, measured through its consequences on the world); (2) 
the same dimensions that the agent used for decision; and (3) a di“erent set of 
dimensions, usually the dimensions that the designer is interested in observing, 
what amounts to look for emergent behaviour (cf. [3]). 

Issues (I) and (2) were addressed in [3], where a cumulative choice function 
F = ^ CfeFfe was proposed, that, given some goal and alternatives characterised 
by values Vk, would sort those alternatives out, selecting the best of them for 
execution. Afterwards, the result of this decision gets evaluated and is fed back 
into the choice mechanism. An appropriate function performs updates on the 
features that are deemed relevant for the decision. For case (1), function G takes 
an assessment in dimension V„, and distributes credit to every v^i, feature of 
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winning alternative i. In case (2), a more complex function H takes a multi- 
dimensional assessment in Vi 4 V„. In both cases, the choice function F 

remains unchanged, since we sustain that our perception of alternatives is more 
likely to change than the way we perform choice. For instance, if based on some 
price-evaluating function P, we pick some wine for dinner and it turns out to be 
bad, we will probably prefer to mark that wine as a bad choice than to change 
P. P can no doubt change, but more slowly, and as a result of other factors, 
like the perception that we constantly pick bad wines, or a major increase in our 
income. 

Case (3) relates the evaluation of the experiments to the behaviours of the 
agents. We have a keen concern for autonomous behaviours. But autonomy can 
only be located in the eyes of the observer. To be autonomous, some behaviour 
need not be expected, but also not extravagant. This hard-to-de“ne seasoning 
will be transported into our agents by an update function such as described 
above, but one that doesn’t depend exclusively on agent-available values. Re- 
turning to the wines example, suppose someone invites us over to dinner. A 
subservient friend would try to discover some wine we know and like to serve 
us with dinner. An autonomous one would pick some wine of his preference, re- 
gardless of the fact we don’t drink any alcohol. In between of these two extreme 
behaviours is the concept of autonomy we are looking for. Our friend knows 
what we like, and serves something close but new, knowing we will cherish his 
care. A similar point was made in [4], where the opinions of a group of experts 
are to be combined. The opinion of each expert receives an a priori weight, and 
each expert aims at increasing his weight by being right most times about the 
outcome of the decision. The proven optimal strategy is to keep honesty and 
report one’s true opinion, even if locally it might appear as the worst choice. 



4 Autonomy in BVG 

We propose several steps in this move towards enhanced autonomy. First, let us 
observe that adaptation and learning should be consequences of the interaction 
between the agent and the world and its components. If the agent would adapt 
as a result of any form of orders from its designer, it wouldn’t be adaptation or 
learning, but design. Of course, if the agent is to respond to a user, this user 
should be present in its world. Castelfranchi’s [8] postulates emphasise the role 
of social interactions in the agent’s autonomy. We draw on these postulates to 
introduce some of the mechanisms involved in agent interaction. For further work 
on these issues, cf. [2]. 

Agents can share information, in particular evaluative information. Mecha- 
nisms to deal with this information can be ”t into BVG in two di”erent ways. 
(1) An agent receives information from another and treats it as if it was an 
assessment made by himself (possibly "Itered through a credibility assignment 
function). Choice can be performed as always, since the new information was 
incorporated into the old one and a coherent picture is available. (2) The re- 
ceiving agent registers the new information together with the old one, and all 
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is considered together by the choice function F when the time comes. A new 
component of F, say Fn+i, deals with the ‘imported’ information separately. 

Agents can also share goals. They can pick goals from each other for several 
reasons. We try to characterise those reasons by recurring to values. We have 
addressed the adoption of goals by imitation (see [1]). Other reasons could be: 
curiosity (the appeal of something new in the absence of foreseen drawbacks); 
a"ect (the adoption of some goal just because it pleases (or serves) another 
agent); etc. In section 5 we provide the mechanisms for some of these cases. 

Finally, agents can share values. In humans, the acquisition of these takes 
years, and relies heavily on some ingredients: some ‘higher’ value, or notion of 
what’s good and what’s bad, that, by the way, we would postulate as a common 
element to any ontology of values; intricate dedicated mechanisms, that include 
some of the ones we have been addressing, but also more di" cult ones like kid’s 
playing and repeating (as opposed to usual computer-like ‘one-shot comprehen- 
sion’ [11]). Our mechanisms for value sharing are necessarily simple, and consist 
basically of conveying values and respective components of the choice and up- 
date functions. Decisions to adopt are arbitrary, as we are at this point more 
interested in looking at the e"ects of this operations in the performance of the 
agents and their system. 



5 Goal Adoption 

Castelfranchi defends an agent should adopt a goal only if this new goal serves 
an already existent goal of his [8] . In [1] this idea is embodied into a goal adop- 
tion rule and expanded into value-based goal adoption. The general mechanism 
introduced was that of imitation. A possible goal is perceived and considered 
for adoption according to the agent’s values. More complex rules were produced, 
that were based on values as the links between goals of the di"erent agents. 

In the case we wanted to maintain Castelfranchi’s rule, our rule of adoption 
by imitation is to adopt the candidate goal if there would be already a goal that 
shared some value with the new goal to adopt. In the absence of a goal to be 
served by the goal candidate for adoption, we propose another rule that would 
base adoption upon the values concerning the imitated agent: 

Adopt{agA, Goal(^(^Vui^i),-,(Vk,i^k))i°'9B,Go)) if 

3Bel{agA, Val{agB, {{Vi, 1), . . . , {Vj, ()■))) : 

3Val{agA,{{Vi, 1),...,(D„ ^-))) : 

ym : i m j, sameValue{Vrn, ”rm fn) (1) 

In this paper, we use a slightly di"erent rule of adoption by imitation, by 
adapting Castelfranchi’s requirement, while weakening the "nal condition linking 
the two goals, i.e., the values of agB are known through the description of its 
goal Go, and it su” ces for agA to have one value coincident with the ones of agB 
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(we could have used agA's goals to infer its values, but con'dcts might occur): 

Adopt{agA, Goal(^(^VuLoi),-AVk,i^k))i^9B,Go)) if 

3Val{agA, (Va, o)) : 

3t(= a) € {1, . . . ,k} : sameValue{Vi,uJi, 'a) (2) 

Now we introduce another mechanism for value-based adoption: curiosity. An 
agent will adopt a goal from another agent if all of the values involved in this 
goal are completely new to him: 

Adopt{agA, Goaf(^Vum),-AVk,i^k))i°'gB,Go)) if 

^3a : [1 4|k a 4|k A: A Val{agA, (Va, )i)] (3) 

We end this section by clearing up what the predicate sameValue means in 
the present framework. Given that most of our value features are numbers in a 
bounded scale, we set as standards the mean point of that scale (in cases were 
the scale is unbounded (Vi, see section 7) we postulated a ‘normal’ mean value). 
Agents de”ne themselves with respect to each value by stipulating on which side 
of that standard they are in the respective scale. So, in a scale from 0 to 20, with 
standard 10, two agents with preference values 12 and 15 will be on the same 
side (i.e., they share this value). 



6 Orderings 

In classical decision theory, we " Iter the set of possible options through a utility 
function (U), and into a linear ordering [9]. In particular, options with the same 
utility become indistinguishable. In the set of options, an equivalence relation 
becomes implicitly de"ned: a T) i" U(a) = U(b). But we want to distinguish a 
from b, or rather, give the agent the possibility of choosing one of the two options, 
based on its motivations. The agent’s motivations cannot be substituted by the 
sole motivation of maximising (expected) utility. 

The situations we want to address demand for orderings weaker than the 
classical linear one, whether our standpoint is a deterministic analysis, or when 
the states of the world are not fully known, and a uncertainty measure can 
be associated: a probability, that depends on the agent’s parameters. It is very 
clear to us that all uncertainty is prone to probabilisation, and this puts us 
foundationally in a subjectivist perspective of probability. If this were not the 
case, we would be stuck either with incomplete models or in situations where we 
cannot quantify the uncertainty. 

When we aim to serialise alternatives, we have several orderings on the dif- 
ferent dimensions that are relevant to choice. These orderings are really quasi- 
orderings (i.e., re”exive, transitive and non antisymmetrical) and may have di- 
chotomy or not. With these quasi-orderings we can build clusters of alternatives. 
Inside each cluster we can consider linear orderings, but the "nal decision must 
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consider the universe of the di"erent clusters. The global ordering (of all alter- 
natives) we aim to get is typically not total (doesn’t have dichotomy, because 
di"erent dimensions are not comparable) and not partial (doesn’t have antisym- 
metry, to avoid the indistinguishability of equally ‘scoring’ options, as explained 
above). 

When we group alternatives in clusters built out of di"erent dimensions, we 
are in fact using a common process in decision. (No agent in a market really 
knows all alternatives, only those that are accessible to him.) Clusterisation 
allows for the decrease of complexity, by excluding alternatives from the search 
spaces. Thus, the decision problem can be decomposed in simpler sub-problems. 
But the problem of integrating the results of those sub-problems remains. This 
integration or amalgamation of the relevant options must be made in terms of a 
dihcotomic quasi-ordering, because two di"erent options must be distinguished, 
so that a choice can be performed. But the arrival at this ‘total’ mixture of 
orderings happens only in execution-time, it does not represent an intrinsic (a 
priori) property of the agent. And this allows for the evolution of the choice 
machinery in its components. 

7 Case-Study: Selecting Wines 

Experimentation has been conducted along a case study whose increasing com- 
plexity demands for the successive introduction of our ideas. The general theme 
is selection of wines by some intelligent agent. In [3], the agent would charac- 
terise wines through ”ve dimensions against which to evaluate them: (Vi) the 
rating of the producer; (V 2 ) the quality of a given region in a given year, (V 3 ) 
the price; (V 4 ) the window of preferable consumption; and (V 5 ) the quality of 
the given wine. 

Choice would then be performed by calculating the result of a choice function 
F = X)fe=i CfcTfc for each of the alternatives, and then picking up the lowest 
score. When the wines get tasted, the results of each tasting are fed back into 
the agent’s choice machinery, and the impressions of the agent on the respective 
wine are updated. These results can be a simple rating of the quality of the wine, 
or a more complex qualitative description of the impressions of the wine. The 
results of performing the update in either of these two ways (cases 2 and 3 in 
section 3) led to di"erent results on the agent’s selection procedure. 

8 Results 

Somewhat unrealistically, once our agent achieved a conclusion about the best 
decision to be made, it would stick unconditionally to this decision: provided 
stock availability, it would keep buying the same wine. The reasons for this 
behaviour are clear, and have to do with the experimental setting not allowing 
for enough openness in the world, i. e. there is not enough source of change. So 
the agent does not have enough reason to change its choice. This is not such 
a bad result in certain contexts, for instance, people tend to stick to the same 
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brand of car through large periods of their lives. This issue is linked with that 
of cognitive consonance, in that the weight of the previous decision overwhelms 
the desire to experiment new options (and the burden of carrying out a new 
complete choice process). 

The solution we adopted was to limit stock availability, and let agents com- 
pete for the best wines, the market imposes itself on the agents. Other alterna- 
tives (to be eventually addressed) would be to adapt the choice function F to 
accommodate some novelty factor. This would amount to consider a new value 
representing the fact that every time we buy some wine we have a little less will 
to buy it again. In fact this amounts to using the ‘law of diminishing returns,’ 
classical in decision theory. Another solution would be to consider the choice 
procedure not to produce a unique winning decision, but a set of winning deci- 
sions from which our agent takes a probabilistically informed pick (see "gure 1). 
Each option in this set has a weight that comes from its contribution for the 
global score of the set. With these weights we can build a random distribution. 

A series of simulations addressed the issue of goal exchange. To minimise the 
number of processes to control, we followed the policy of when in lack of reasons 
to perform some action (taken from a class of similar actions), this selection is 
arbitrarily done (usually using random numbers with adequate distribution) . In 
our simulation, an agent will drop a randomly selected goal once in every four 
times he is active. This would allow us to concentrate on the mechanisms of goal 
exchange, instead of those of goals mental management. 

We took 26 agents, of which only 2 were endowed with some goals. In some 
cases all goals were dropped rather quickly, whereas in others all agents had 
virtually all goals. In the most interesting cases, goals start travelling through 
the society. An agent who originally had all the goals may end up with just 
two, although he has spread them around before loosing him. There’s the risk 
of the society loosing goals, although that did not happen frequently. The big- 
ger danger is of the society loosing sense. What was meant as a means of en- 
hancing bonds (through the share of interests) in the society ended up caus- 
ing chaos. A typical result after 1000 runs would be: agA, from his original 
goals {gi,g 2 ,g^,gdi,gdz,gdQ\ has goals {gdeiffs} only, he dropped goals some 
14 times, including gd^ (so this goal was later adopted from another agent). Goal 
gd^ was lost, although this never would happen for goals g\, g 2 , and g^ which 
were originally also owned by agB. A total of 269 goals were dropped, and the 
whole society has now 76 goals, spread through the agents. Two agents have no 
goals, although all agents have had goals at some time. 

When we cross these simulations with the choice-perform-update ones, we 
reach the conclusion that what were the reasons for the selected actions may 
have been lost, with a high probability. Agents are committed to selections but 
they no longer have access to the rationality that caused those commitments, 
and so will have trouble deciding whether or not to keep them. Even worse, this 
hazards the update process and undermines the whole choice model, since this 
model by and large depends on a feedback circle that is now broken. 
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It is apparent that the overlooking of grounded goal management was the 
cause for this problem. The focus on preference-based choice was too heavy, and 
a return to a more solidly built mental basis may be in need, before readdressing 
the choice issues. This basis may as well be provided by BDI models, because of 
their emphasis in the web of technical, well-founded and well-structured mental 
states and corresponding relationships, and also because they are the inspiring 
models of BVG. Nevertheless, this only means that the balance between goal 
management and goal selection should be more thoroughly addressed. In no way 
these results undermine the value of explicitly focussing on choice in the agent 
architecture context. 



9 Conclusions 

The main driving force behind the BVG architecture has always been to design 
new rationalities (cognitive agent idealisations) and to inquire into motivation: 
what makes us do what we do; what allows us to balance what we ought to do, 
with what we can do, with what we want to do. And even how do we rationalise a 
bad (for our own judgement) decision in order not to pain over it again and again. 
The BVG model is another way to surpass the usual di" culties associated with 
the use of normative models from economics and decision theory (maximisation 
of expected utility). It provides a framework where multiple values are used to 
compute practical (good enough) decisions (reasoning from beliefs to values, and 
after, to goals and actions), and then these values are updated according to the 
outcome of those decisions. In this and other recent papers we have thrived to 
show how this model produces enhanced autonomy in an environment where the 
multiplicity of agents causes high complexity and unpredictability, by paying 
more attention to the choice machinery besides optimisation. But, the topic 
multiple agents is now in focus and the experiments done so far helped to improve 
our ideas around a better BVG proposal. 

Our model departs from BDI (where choice is done by maximisation), by 
relaxing desires, and stating that agents need reasons for behaviour that surpass 
the simple technical details of what can be done, how it can be done and when 
it can be done. These reasons (which we might call ‘dirty,’ as opposed to ‘clean,’ 
logical ones) are then the basis of the agent’s character, and are founded on values 
as representatives of the multiple agent’s preferences. With the experimentation 
herein reported, we concluded that the dropping of those ‘clean’ reasons was 
perhaps hasty, and even a preference-conscious agent must possess a sound basis 
of logical mental concepts on which to ground. It is then only natural to return 
again to BDI as a provider of such a machinery, and to build the multiple- 
value schema on top of it. Another conclusion is the necessity of goals on which 
to base behaviour. Values are not enough, even when they can cause goals to 
be adopted. A careful weaving is still in order. However, we have found that 
interesting behaviours can be obtained from our agents: we "nd independent 
behaviours, as well as leaders and followers; we can observe several degrees of 
adaptation to a choice pattern, depending on the parameters of the model; we 
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can follow the spread of goals through a society of value-motivated agents. The 
agents’ exploration of the space of possibilities is carried out di"erently from 
what would happen with a search procedure (even if guided by heuristics) since 
is driven by the agents’ preferences, and these are heterogeneous. 

Situated decision making is choosing among several alternatives and prefer- 
ences in ways that emulate human beings. Alternatives are actions in the world, 
that are selected according to preference rankings or probability judgements (n- 
space values) to "t agent behaviours to precise dynamic contexts. Such an aim 
is the key for autonomy. 
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Abstract. This paper presents a formal framework within which au- 
tonomous agents can dynamically select and apply different mechanisms 
to coordinate their interactions with one another. Agents use the task 
attributes and environmental conditions to evaluate which mechanism 
maximises their expected utility. Different agent types can be charac- 
terised by their willingness to cooperate and the relative value they place 
on short- vs long-term rewards. Our results demonstrate the viability of 
empowering agents in this way and show the quantitative benehts that 
agents accrue from being given the flexibility to control how they coor- 
dinate. 



1 Introduction 

Autonomous agents are increasingly being deployed in complex applications 
where they are required to act rationally in response to uncertain and un- 
predictable events. A key feature of this rationality is the ability of agents to 
coordinate their interactions in ways that are suitable to their prevailing cir- 
cumstances [5]. Thus, in certain cases it may be appropriate to develop a de- 
tailed plan of coordination in which each of the participant’s actions are rigidly 
prescribed and numerous synchronisation points are identified. At other times, 
however, it may be appropriate to adopt much looser coordination policies in 
which the agents work under the general assumption that their collaborators are 
progressing satisfactorily and that no explicit synchronisation is needed. What 
this illustrates is that there is no universally best method of coordinating. Given 
this fact, we believe agents should be free to adopt, at run-time, the method 
that they believe is best suited to their current situation. Thus, for example, in 
relatively stable environments social laws may be adopted as the most appro- 
priate means of coordinating [10], whereas in highly dynamic situations one-off 
contracting models may be better suited [11], and in-between, mechanisms that 
involve the high-level interchange of participants’ goals may be best [4]. 
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To achieve this degree of flexibility, agents need to be equipped with a suite 
of coordination mechanisms (CMs) (with different properties and characteris- 
tics), be provided with a means of assessing the likely benefit of adopting the 
various mechanisms in the prevailing circumstances, and have the ability to se- 
lect and then enact the best mechanism. Against this background, this paper 
develops and empirically evaluates a generic decision making model that agents 
can employ to coordinate flexibly. Specifically, we identify a number of poten- 
tially differentiating features that are common to a wide range of CMs, provide a 
decision-theoretic model for evaluating and selecting between competing mech- 
anisms, and empirically evaluate the effectiveness of this model, for a number of 
CMs, in a suitably general agent scenario. This work builds upon the preliminary 
framework of [2], but makes the following advances to the general state of the 
art. Firstly, using a range of CMs, we show that agents can effectively evaluate 
and decide which to use, dependent on their prevailing conditions. Secondly, the 
evaluation functions associated with these CMs highlight the different types of 
uncertainty agents need to cope with and the different environmental parame- 
ters they need to monitor, in order to coordinate flexibly. Thirdly, we show that 
individual agent features such as their willingness to cooperate and the degree 
to which they discount their future rewards affects which CM they adopt. 

The remainder of the paper is structured in the following manner. Section 2 
outlines the key components of the reasoning model and introduces the exemplar 
coordination models we evaluate in this work. Section 3 describes the grid world 
scenario we use for our evaluation. Section 4 formalises the reasoning models. 
Section 5 describes the experimental results and analysis. Section 6 looks at 
related work in this area. Finally, in section 7 we draw our conclusions. 



2 Coordination Mechanisms 

Flexible coordination requires the agents to know both how to apply a given 
CM and how to reason about which mechanism to select. In the former case, an 
agent must have access to the necessary protocols for coordinating with other 
agents and/or the environment. In the latter case, an agent must be capable of 
evaluating and comparing the possible alternatives. 



2.1 Protocols for Coordination 

Coordination involves the interworking of a number of agents, subject to a set of 
rules. The specification of exactly what is possible in a particular coordination 
context is given by the coordination protocol [8]. Thus, such protocols indicate 
the parties (or roles) that are involved in the coordination activity, what com- 
munication flows can occur between these parties, and how the participants can 
legally respond to such communications. Here we refer to the instigator of the 
coordination as the manager and to the other agents that assist the manager as 
the subordinates (or subs). 
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For example, in the Contract Net protocol, the manager initiates a two-stage 
process whereby bids are requested and received, and then selected and subs 
are appointed. In a simpler mechanism, such as being commanded by a superior 
officer, a sub simply obeys the commands it receives. In all cases, however, the 
key point is that for each mechanism an agent supports, it must have the ability 
and the know-how to enact the protocol. 

2.2 Evaluation of Mechanisms 

A manager that is faced with a coordination task will have several potential 
CMs at its disposal. Each such mechanism requires a means of determining the 
expected value it will provide, which should be comparable with the others avail- 
able. To this end, an evaluation function is needed. The value of a given CM 
may depend on many features including: the reward structure, the likely time of 
completion, and the likely availability of subordinates. Generally speaking, the 
more complex the coordination protocol and reward structure, the more com- 
plex the evaluation function. In particular, the more uncertainty that exists in 
the agent’s ability to set up and enact a CM, the harder it is to evaluate its use 
accurately. Moreover, some of the parameters that are needed for evaluation are 
likely to vary from mechanism to mechanism. A final consideration is that the 
value of an agent’s current situation may also depend on the state of the mecha- 
nism it has adopted. For example, an agent that has successfully recruited other 
agents may value its state more highly than one still engaged in the recruitment 
process. For all these reasons, evaluation functions need to be tailored to the 
specific CM they describe. 

When subordinates are invited to assist in coordination, they too must assess 
the value of accepting. These valuations are typically less complex than those 
for the managers since the reward on offer and the completion time are gener- 
ally declared, though in some cases a sub may also need to handle uncertainty. 
Subs also need to take into account whether accepting an offer would incur any 
additional cost, such as a penalty for dropping a commitment to another agent. 

2.3 Sample Mechanisms 

This section outlines the protocols and reward structures for the CMs considered 
in this work (their evaluation functions are left to section 4). Clearly this list is 
not exhaustive. Rather our aim is to incorporate specific exemplars that are typi- 
cal of the broad classes of coordination techniques that have been proposed in the 
literature. Thus, the precise form of each mechanism is of secondary importance 
to its broad characteristics and performance profile. Moreover, additional mech- 
anisms can be incorporated simply by providing appropriate characterisations 
and evaluation functions. Nevertheless we believe that the chosen mechanisms 
are sufficient for our main objective of demonstrating the efficacy of dynamically 
selecting CMs. 

In this work, tasks are assumed to have several attributes: a minimum per- 
sonnel requirement (mpr), a total requirement of agent effort (effort), and a 
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reward that is paid to the managing agent when the task is accomplished. Thus 
a task that has mpr of 3 and effort of 6 may be realised by 3 agents each 
contributing 2 units, by 6 agents each contributing 1 unit, but not by 2 agents 
each contributing 3 units. It follows that tasks with a high mpr are likely to 
incur a delay before the necessary subs are recruited and all agents can start 
working on the task together. Here agents can only work on one task at a time. 
However, each agent has a default task {mpr = 1, effort = 1) that it puts on 
hold whenever it agrees to participate in a more complex one. The type of task 
and how many are generated can all be varied experimentally. 

Asocial CM: This mechanism is used when a manager elects to perform a 
task alone; therefore there is no need to coordinate with any subs. The manager 
adopts the task, works on it and, ultimately, receives all the reward. This CM 
can only be used on tasks for which mpr = 1. 

Social Law CM: A manager may elect to have the task performed by invoking 
a social law (SL) that has been agreed in advance by all the agents in the 
system^. For a task with mpr = n, the nearest n—1 other agents are commanded 
to work on the task with the manager. Because the social law extends to all 
agents, the subordinates cannot refuse to help. They simply assist until they 
are released from the task. The reward is then divided equally among all the 
participants. In our experimental setting, the location of subs was performed by a 
central coordinator allowing minimal set up delay and the prevention of multiple 
managers attempting to command subs at the same time. A truly distributed 
version is possible though would require a longer set up time. 

Pot Luck CM: A manager that elects to use Pot Luck (PL) coordination, sets 
terms under which it is willing to pay subs on a piecemeal (step-by-step) basis. 
These terms are then offered to all agents in the direct vicinity (this is called 
“pot luck” since the manager makes no active effort to recruit subs in the hope 
that some are already present or wander by shortly) . When the task is completed 
the manager receives the full reward. From the subordinate’s point of view, it is 
occasionally offered “temporary” work for an indefinite period at a fixed rate; it 
either accepts and works on the task, or declines and ignores the offer. This CM 
is likely to be more successful when the environment is densely populated. But 
because the manager issues a blanket offer, it runs the risk of both over- and 
under-recruitment of subs. A sub can decommit from a PL task at any time at 
no penalty, keeping any reward it has already earned. 

Contract Net CM: A manager that elects to use Contract Net (CN) coor- 
dination requests bids from other agents that submit their terms according to 
their current circumstances. The manager selects from among the bids received 
and sends out firm offers. An agent receiving an offer either accepts and works 
on the task, eventually receiving a reward based on its bid, or declines. The 
manager may thus fail to recruit sufficient subs in which case it repeats the 

^ The process of agreeing the social law is not considered in this work. In particular, 
this means that the associated costs of acheiving consensus are not factored into the 
cost of this mechanism. 
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request-bid-offer cycle. When the task is accomplished, the manager receives the 
whole reward but must pay off the subs with the agreed amounts. Again, under 
this CM a manager may under recruit if some agent declines its offer. Once a 
sub has accepted a task under this CM, it may later decommit though it will 
have to pay a penalty for doing so. If a manager abandons a task under this CM, 
it must pay off any subs it has recruited, according to their terms. 



3 Grid World Scenario 

The scenario involves a number of autonomous agents occupying a grid world (see 
[2] for more details) . Tasks are generated randomly according to the experimental 
conditions and are found by the agents who must decide whether or not to take 
them on. Each agent always has a specific (default) task that may be carried 
out alone. Tasks are located at squares in the grid and each has an associated 
mpr, effort and reward. One unit of effort is equivalent to one agent working on 
the task at the square for one time step (provided sufficient agents work on it in 
total). When accepting a task, an agent must decide how to tackle it; if the mpr 
is greater than one, it must recruit other agents to assist it, according to the 
various CMs at its disposal. To simplify the evaluation functions, tasks persist 
until they have been achieved. A more realistic setting with deadlines on tasks 
will require that the evaluation functions be extended to take into account the 
possibility of a task not being completed in time. 

The agents themselves have various features that can be parameterised to 
simulate different behavioural types. Each agent has a willingness to cooperate 
(wtc) factor which it uses when bidding for and evaluating tasks; when this factor 
is low {wtc < 1) the agents are greedy and when high {wtc > 1) they are selfless; 
in-between {wtc = 1) agents are neutral. Agents also use a discount factor (see 
below) which reflects their capacity to value short- vs long-term rewards. 

The agents move synchronously around the grid world, being capable of five 
actions: up, down, left, right and work (remain). To simplify the analysis below, 
the grid is formed into a torus so that an agent moving up from the top of the 
grid arrives at the bottom in the corresponding column; similarly for the left 
and right edges. This enables us to use a relatively simple probabilistic model of 
square occupancy by agents. 

This scenario offers a broad range of environments in which the effective- 
ness of reasoning about the various CMs can be assessed. While this scenario 
is obviously idealised (in the tradition of the tileworld scenario for single agent 
reasoning), we believe it incorporates the key facets that an agent would face 
when making coordination decisions in realistic environments. In particular, the 
ability to systematically control variability in the scenario is needed to evaluate 
our claims about the efficacy of flexible coordination and the dynamic selection 
of CMs according to prevailing circumstances. 
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4 Agent Decision-Making 

Since the agents always have a specific task to achieve with mpr = 1 and 
effort = 1 they always have a non-zero expectation of their future reward. How- 
ever, they can increase this reward by initiating or participating in collaborative 
tasks (CTs) with other agents. In deciding on its collaborations, each agent aims 
to maximise its future rewards by adopting the best mechanism under the cir- 
cumstances. Thus at all instants in time, an agent will have a current goal being 
undertaken using a particular CM with the agent taking a particular role. 

Agents may find tasks in the environment or be offered them by other agents. 
In either case, however, an agent needs a means of evaluating new tasks under 
each of the CMs. It must also be able to compare these values with its current 
task so as to determine when to accept a new offer. Thus each CM has an 
associated evaluation function which it can use to approximate the value of 
potential new tasks, as well as any current task being undertaken using it. These 
evaluation functions have some common features, but they also differ between 
the CMs in that they may use different aspects of the (perceived) environment. 

An important consideration that needs to be incorporated into the evaluation 
of new offers is that of liability. If, for example, an agent has agreed to participate 
under the Contract Net CM, it may incur a penalty for decommitting - any 
new offer will need to be greater than the current task and the decommitment 
penalty. Conversely, a manager must manage its liabilities so that, if a task 
becomes unprofitable, it can “cut its losses” and abandon it (since the situation 
has already deteriorated from the time it was adopted). 

Our agents are myopic in that they can only see as far as the next reward, 
however since tasks may arrive at different times in the future we discount all 
rewards back to the present using a discount factor, 0 < 5 < 1. When 5 « 1, 
the difference between long- and short-term rewards is not great; however, when 
(5 << 1, short-term rewards appear more attractive [7]. 

For the Social Law, Pot Luck and Contract Net CMs both managers and subs 
mask their valuations according to their own wtc factor. This makes coordinating 
over collaborative tasks a more attractive proposition when wtc is high and less 
attractive when it is low. Evaluation of the Asocial CM is unaffected by wtc. 

The following subsections give details of the evaluation functions used for 
each of the aforementioned CMs. These functions are designed to illustrate the 
sorts of functions that can be used to evaluate CMs. They are necessarily ap- 
proximate valuations, not least because there is a great deal of uncertainty and 
imperfect information in the scenario. We do not claim that they are the only 
ones possible, nor that they are optimal, but rather that they are reasonable and 
demonstrate that CMs can be evaluated in dynamic and unpredictable environ- 
ments using suitably parameterised functions. 

4.1 Asocial CM 

This CM simply involves the agent moving towards its task and working there 
for the necessary number of time steps before receiving the reward. To evaluate 
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a task with mpr = 1, effort = e and reward = R, an agent discounts the reward 
it expects by the time until it will receive it. If the distance to the task is /, the 
expected value of the CM, Va, is given by: 

Va = RS‘+^ 

If the agent is already performing the task (1 = 0), the reward is discounted by 
the amount of effort remaining. 

Evaluation of the Asocial CM does not require any additional environmental 
information. 



4.2 Social Law CM 

When an agent adopts the Social Law CM, the appropriate number of the nearest 
agents are commanded to come to the manager’s assistance. When the final agent 
arrives, all agents work on the task until completion and the reward is equally 
divided among them. 

To evaluate a task with mpr = m, effort = e and reward = R under this CM, 
the agent must estimate the time it will take for the furthest agent to arrive at 
the square. This can be calculated using the average occupancy of each square^. 
The manager adds up the occupancies of the nearest squares to it until it obtains 
m — 1, and uses the furthest square required to estimate the time till all agents 
will be present, say 1. Since this CM can only be invoked on one task at a time, 
the manager may also take some non-zero set up cost into account, say s. Given 
these estimates, the expected value of this CM, Vsl, is given by: 

RS(l+-+^) 

Vsl = 

m 

Evaluation of the Social Law CM requires knowledge of the distribution and 
density of other agents, here represented by average occupancy, as well as any 
set up costs which using the social law may incur. 

4.3 Pot Luck CM 

When an agent adopts the Pot Luck CM, it makes no effort to recruit other 
agents unless they happen to enter the square where the CT is situated. In such 
cases, the manager offers the potential subordinates employment for an indefinite 
period at a fixed rate. The terms it offers reflect the agent’s perceptions about 
the wtc and discount factor of other agents; it sets a rate that, it believes, is 
sufficient to attract passers-by until the task has been achieved. Any agents that 
accept this offer and remain at the square, committed to this task, receive the 
agreed rate at each step. These offers of piecemeal employment remain until the 
task is completed or the managing agent abandons it. When the task is complete, 
the managing agent receives the reward. 

^ Th&t is Agents 

’ numberOfSquares 
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To evaluate a task with mpr = m, effort = e and reward = R, the manager 
assumes that subs will wander by at intervals determined by the remaining aver- 
age occupancy: if n agents are at large, the interval is given by number Of Squares 
jn. The manager computes a rate, r, to offer to the subs that it would find ac- 
ceptable itself in similar circumstances (see below) . Based on these assumptions 
and the task effort, the agent computes the expected completion time of the 
task, ect and the future value of the amount it will have to pay out to the subs, 
p. Then the expected value of applying this CM, VpL, is given by: 

Vpl = {R-p)S^^^ 

When this CM is in use, the manager uses a similar technique to evaluate the 
task in progress, taking into account any subs already helping. Clearly, if agents 
are already present, the value increases considerably. Note that, with the Pot 
Luck CM, it is possible that the manager recruits more agents than the mpr, 
meaning that the task will be achieved more quickly. 

Subordinates evaluating a Pot Luck offer discount the rate offered, r, indefi- 
nitely into the future: 



Thus although the rate may be low compared with the reward for their default 
task, the fact that they will receive it regularly starting from the next time step 
makes the offer relatively attractive. 

Evaluation of the Pot Luck CM again requires knowledge of the distribution 
and density of other agents, though this knowledge is used in a different way. 
Managers need to be able to offer an appropriate rate to subs; in general, the 
manager will need to consider the other agents’ wtc and discount factors, though 
here it assumes them to be the same as its own. 

4.4 Contract Net CM 

Under this mechanism, a manager broadcasts a request for bids to the other 
agents. On receiving their replies, it computes the best set of bids and sends 
out firm offers of employment to the selected agents. These agents may either 
accept the offer or decline, possibly causing the managing agent to issue more 
requests. When the task has been completed, the manager receives the reward 
and pays its recruits the agreed amounts. Since the completion time of the task is 
unknown, subordinates bid an amount which reflects their current requirements, 
and when they are paid off this is factored up by their discount rate and their 
time committed. This means that the manager has an increasing liability to the 
other agents so long as the task remains unfinished — any extension to the ect 
may therefore greatly affect the value of using this CM. 

To evaluate this mechanism, the agent estimates the average distance away of 
the furthest agent (as described above) and adds to this the communication costs 
of this CM (3 time steps until subs will be committed) and the duration based 
on effort /mpr. This gives the ect. The manager also estimates the likely bid 
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requirements of the subs based on its perception of their wtc, discount factors 
and their specific task rewards. Thus if it anticipates completion in ect time 
steps, with i subs committed 3 time steps hence each bidding rj, the value of 
using the Contract Net CM is given by: 

Vcn = 5^^\R-Y.^) 

The reward structure of this CM is such that the subs only receive payment 
when the manager either completes or abandons the task. At this time they 
receive their bid factored up by the amount of time they have been committed 
(agent i receives ri/5l after being committed for t time steps). Thus, however 
long they are committed to the task, the reward they receive, discounted back to 
when they started, remains the same. In this way the manager can determine its 
current liability at any stage of the coordination, e.g., when an offer is rejected. 

An agent bidding under the Contract Net CM, factors its current reward by 
its wtc and projects it two time steps into the future, since this is when an offer 
will arrive. It submits this amount, r^, plus its required discount factor. Si, and 
the time it expects to arrive at the task. The manager selects bids based on how 
they affect its expected reward. That is, it looks at the surplus reward when each 
agent has been paid, discounted by how long until it arrives. This simplifies the 
otherwise combinatorial problem of selecting the best i bids. 

Evaluating a Contract Net CM that is underway, involves computing the 
current liability to agents committed, projecting this forward till the ect and 
discounting this value and the reward back to the present. As time goes by, the 
value of participating in this CM increases for subs (since the manager will be 
paying more and more at each time step) and so a sub becomes more committed 
the longer it has been involved. 

If a sub decommits, even through being coopted under Social Law, it must 
pay the manager a decommitment penalty, which is intended to compensate the 
manager for the time wasted. Although not implemented here, a more sophis- 
ticated evaluation function would require an estimate of the likelihood of subs 
decommitting. Additionally, more flexible decommitment penalties could be set 
dynamically to reflect the prevailing circumstances. This aspect of our work is 
being investigated concurrently and is reported in [6]. 

Evaluation of the Contract Net CM requires similar knowledge to the Pot 
Luck CM. 

5 Experiments, Results and Analysis 

Our first set of experiments assessed the accuracy of each CM’s evaluation func- 
tion; the circumstances under which each is selected; and to what extent this 
additional flexibility benefits the managing agent. To this end, we conducted a 
series of simulations for a 10 x 10 grid, containing just one CT with reward = 15 
(default tasks have reward = 1). Furthermore, all agents have wtc = 1 and 
discount = 0.9. We varied the number of agents (from 5 to 25), the mpr (from 
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1 to 8) and the effort (from 10 to 30). Since this represents a very large sample 
space, we report a selection of representative results. 




Fig. 1. Performance of CM Evaluation Functions 



Figure 1 plots the difference between the value managers actually achieved 
and their evaluations, for each CM, mpr = 1,2,3 and effort = 10. (SL and CN 
do not apply when mpr = 1; the value of PL when mpr = 3 and there are 
only 5 agents is negative and so the manager sticks with its default task.) PL is 
the least accurate, overestimating the true value by 50% in the worst case. SL 
and CN are more accurate reflecting their lower levels of uncertainty. The graph 
indicates that the accuracy of all CMs tends to improve as the agent population 
increases. Bearing in mind that, for a discount factor of 0.9, a one step error in ect 
leads to about a 10% error in evaluation, these results show that the evaluation 
functions work acceptably under a variety of conditions. This demonstrates that 
it is feasible to evaluate CMs based on estimates of environmental parameters. 

Figure 2 shows the type of CM selected by the managers under different 
agent densities and task profiles. It is clear that, at least from the manager’s 
point of view, for high agent densities and low mpr, PL is the most preferred. 
This is due to PL’s low communication costs (no active recruitment), low payout 
rate to subordinates and the ability to over-recruit. However, under more general 
conditions, CN is the most preferred: the communication cost (set up delay) and 
the relatively high rates paid to subs are compensated for by the reduced time 
till all agents arrive. For tasks with extremely stringent requirements, i.e., high 
mpr or effort in low density populations, the speed and certainty of SL leads to 
it being selected over CN. In fact, the choice between CN and SL depends mainly 
on the reward assigned to the task — for lower task rewards, SL is preferable to 
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Fig. 2. Preferred CM by Task/Environment 



CN. In the extreme, even SL is unprofitable and managers opt to perform their 
default tasks instead. 

Figure 3 shows the total reward achieved by managers using each CM in 
isolation, and when the managers have the capability to dynamically choose 
which CM to use. The number of agents is fixed at 15, and the task profiles 
have fixed reward = 15, effort = 10 and mpr ranging from 1 to 8. As the mpr 
increases, the surplus reward available to managers naturally decreases, but the 
graph shows that agents who can coordinate flexibly maintain high levels of 
reward (in fact the line for ALL roughly traces the maximum over all CMs). 

When taken together, these results clearly show that providing alternative 
mechanisms for coordination is beneficial to agents that are required to interop- 
erate under changeable environmental or task conditions. 

Our second set of experiments examined the overall effect on the system when 
the agents displayed different characteristics in terms of their wtc and discount 
factors. The general hypothesis here is that when agents are less greedy, more 
collaborative tasks will be achieved and that an agent’s perspective on future 
rewards may well affect the type of CM it chooses. To test this, we conducted a 
series of experiments varying wtc from 0.25 to 3 and discount from 0.5 to 0.95 
(the number of agents was fixed at 15, and the tasks have fixed reward = 15, 
effort = 10 and mpr = 2). 

Figure 4 shows the total number of CTs achieved for each discount factor as 
wtc increases. When the discount factor was 0.8 or less, the managers always 
selected SL, because this minimises the ect. However, when the agent’s wtc is 
low, it devalues its own reward for CTs to the extent that the default task is 
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Fig. 3. Total Reward Achieved by Managers 



often chosen instead. The choices for discounts of 0.9 and 0.95 were identical, 
with PL being chosen for wtc < 1 (agents neutral or greedy) and CN being 
chosen otherwise (agents selfless). This indicates that, if potential subordinates 
are likely to require relatively large rewards and the manager can afford to wait, 
it will choose to do so, paying out less to subordinates. These results confirm 
that the relative value of using different CMs is indeed affected by the both the 
individual characteristics of an agent and its beliefs about the environment in 
which it operates. 



6 Related Work 

The majority of previous work on multi-agent system coordination assumes it is 
a design time problem (e.g., [10,11,4]). However [5] has argued that agents need 
the flexibility to coordinate at different levels of abstraction, depending upon 
their particular needs at a given moment in time. To date, however, this work 
has not developed mechanisms for explicitly reasoning about which level to co- 
ordinate at in a given situation. Such flexibility was also built into cooperative 
problem solving agents by [9]. Here, agents could choose to cooperate according 
to various conventions which dictated how they should behave in a particular 
team context. These conventions varied in terms of the time they took to es- 
tablish and the communication overhead they imposed. However, again, there 
was no reasoning mechanism for determining which convention was appropri- 
ate for a given situation. Boutilier [3] presents a decision making framework, 
based on multi-agent Markov decision processes, that does reason about the 
state of a coordination mechanism. However, his work is concerned with optimal 
reasoning within the context of a given coordination mechanism, rather than 
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Fig. 4. CTs Achieved by WTC and Discount Factor 



actually reasoning about which mechanism to employ in a particular situation. 

[1] present a software engineering framework that enables agents to vary their 
CMs according to their prevailing circumstances. They also identify criteria for 
determining when particular mechanisms are appropriate. However, the decision 
procedures for actually trading-off these criteria are not well developed. Finally, 

[2] provide a framework in which CMs are characterised by set up costs and 
probability of success and can be evaluated accordingly; however, their agents 
use a contract-net style protocol for all their interactions. 

7 Conclusions 

This paper presented a framework in which agents can evaluate and apply differ- 
ing CMs and has demonstrated that agents can benefit from such flexibility. It 
has shown that CMs can be practically evaluated using appropriate environmen- 
tal parameters despite the uncertainty agents face. However, these experiments 
use static environmental conditions and the agents involved use assumptions 
about the environment. To overcome these restrictions, in our future work we 
will allow agents to monitor and learn the relevant environmental parameters so 
that they can react to dynamic environments. Agents will then be in a position 
to adapt their own attributes {wtc and discount) to better suit their circum- 
stances. Given agents that learn and adapt to their environment, it will also be 
important to assess whether alternative evaluation functions or heuristics impact 
on agent performance. 
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Abstract. We present an approach to agents that can reason, react to 
the environment and are able to update their own knowledge as a result of 
new incoming information. Each agents’ view of the social relationships 
among agents (itself and others) is represented by a graph, which in turn 
can be updated, allowing for the representation of such social evolution. 



1 Introduction and Motivation 

Over recent years, the notion of agency has claimed a major role in defining 
the trends of modern research. Influencing a broad spectrum of disciplines such 
as Sociology, Psychology, among others, the agent paradigm virtually invaded 
every sub-field of Computer Science. Besides allowing for a unified declarative 
and procedural semantics, eliminating the traditional wide gap between the- 
ory and practice, the use of several and quite powerful results in the held of 
non- monotonic extensions to Logic Programming (LP), such as belief revision, 
inductive learning, argumentation, preferences, abduction, etc. [15, 17] can repre- 
sent an important added value to the design of rational agents. These results, 
together with the improvement in efficiency (cf. [5,16,18]), allow Logic Program- 
ming and Non-monotonic Reasoning to accomplish a fruitful degree of combina- 
tion between reactive and rational behaviours of agents, whilst preserving clear 
and precise specification enjoyed by declarative languages. This goal in mind, 
Kowalski and Sadri [11] advanced an agent architecture based on an observe- 
think-act cycle. It was further developed by Dell’Acqua and Pereira [6] to allow 
the agents to dynamically update their own knowledge base (whether intentional 
or extensional) as well as their own goals. The capability of updating provided 
to these agents is inherited by Dynamic Logic Programming [1]. 

Dynamic Logic Programming (DLP) was introduced, following the eschewing 
of performing updates on a model basis, to envisage updates as applying to logic 
programming rules making up a theory [14] . According to DLP, knowledge is con- 
veyed by a set of theories (encoded as generalized logic programs) representing 
different states of the world. Different states may represent distinct dimensions 
such as different time periods, different hierarchical instances, or even different 
domains. Consequently, the individual theories may contain mutually contradic- 
tory and overlapping information. The role of DLP is to take into account the 
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mutual relationships extant between different states to precisely determine the 
declarative and the procedural semantics of the combined theory comprised of 
all individual theories and the way they relate. 

Although DLP can represent several states in one evolving dimension or as- 
pect of a system, no more than one such aspectual evolution can be encoded 
and combined simultaneously. This is so because DLP is defined only for linear 
sequences of states. In [15], the states represent different time instants. To over- 
come this drawback, Leite et al. introduced Multi- dimensional Dynamic Logic 
Programming (MDLP) [12], which generalizes DLP to allow for collections of 
states organized in arbitrary directed acyclic graphs (DAGs). Within this more 
general theory, one can encode simultaneously all representational dimensions, 
which can be particularly useful in the context of multi-agent systems. 

In this paper, we formalize agents with such capabilities, generalizing the 
framework of [15] to allow for an arbitrary number of dimensions represented by 
an arbitrary DAG. We will show how this new theory is useful for an agent to 
represent and reason about its own and other agents’ knowledge and its evolution 
in time. Each agent’s view of the evolving social relationships among agents 
(itself and others) is represented by a DAG, itself in turn updatable, to capture 
the representation of social evolution. 

The contribution of the paper is therefore twofold. On the one side, the 
paper presents an extension of the framework for multi-dimensional dynamic 
logic programming to incorporate integrity constraints and active rules, and to 
make the DAG of each agent updatable. On the other, the paper provides a 
means to incorporate the obtained framework into a multi-agent architecture. 
For simplicity, we have considered propositional generalized logic programs. 

2 Preliminaries 

Graphs A directed graph, or digraph, D = {V, E) is a pair of two finite or 
infinite sets V = Vd of vertices and E = Ed of pairs of vertices or {directed) 
edges. A directed edge sequence from vq to in a digraph is a sequence of edges 
ei,e 2 ,...,e„ G Ed such that = (wj_i,Vi) for i = A directed path is 

a directed edge sequence in which all the edges are distinct. A directed acyclic 
graph, or acyclic digraph (DAG), is a digraph D such that there are no directed 
edge sequences from v to v, for all vertices v of D. A source is a vertex with 
in- valency 0 (number of edges for which it is a final vertex) and a sink is a vertex 
with out- valency 0 (number of edges for which it is an initial vertex). We say 
that u < w if there is a directed path from v to w and that v < w if v < w or 
V = w. The transitive closure of a graph D is a graph D+ = (V) A+) such that 
for all v,w € V there is an edge {v,w) in E^ if and only if v < w in D. The 
relevancy DAG of a DAG D wrt a vertex r; of D is Dy = (Vy,Ey) where Vy = 
{vi : Vi € V and Vi < ?;} and Ey = {{vi,Vj) : {vi,Vj) G E and Vi,Vj GVy}. 

Logic Programming Framework In order to represent negative information 
in logic programs, we need more general logic programs that allow default nega- 
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tion not A not only in premises of their clauses but also in their heads^ . We call 
such programs generalized logic programs. It is convenient to syntactically rep- 
resent generalized logic programs as propositional Horn theories. In particular, 
we represent default negation not A as a standard propositional variable. 

Propositional variables whose names do not begin with “nof^ and do not 
contain the symbols and “-7” are called objective atoms. Propositional 
variables of the form not A are called default atoms. Objective atoms and 
default atoms are generically called atoms. 

Propositional variables of the form i:C (where C is defined below) are called 
projects. i:C denotes the intention (of some agent j) of proposing the updat- 
ing the theory of agent i with C . Propositional variables of the form j C 
are called updates, j ^ C denotes an update that has been proposed by j 
of the current theory of agent (of some agent i) with C. We assume that up- 
dates cannot be negated (i.e., we disallow not i^C). Instead projects can be 
negated. A negated project not i:C denotes the intention of the agent not to 
perform project i:C. Let /C be a set of propositional variables consisting of ob- 
jective atoms and projects such that false ^ 1C. The propositional language 
£jc, generated by 1C, consists of the following set of propositional variables: 
CfC = 1C U {not A |V atom A G /C} U {i^C, not i:C |Vi:C G 1C}. A general- 
ized rule in Cjc is of the form: Lg ^ Li A . . . A L„, (n > 0) where every Li 
(0 < i < n) is an atom from Cjc. An integrity constraint in Cjc is a rule of 
the form: false ^ Li A . . . A A Zi A . . . A Zm, (n > 0, m > 0) where every Li 
(1 < i < n) is an atom, and every Zj (1 < j < m) is a project from Lk.. Integrity 
constraints are rules that enforce some condition over the state, and therefore 
take the form of denials. A generalized logic program P in Cx, is a set of gen- 
eralized rules and integrity constraints in Cx. A query Q in Cx is of the form: 
?— Li A . . . A L„, (n > 1) where every Li {1 < i < n) is an atom from Lx. The 
following definition introduces rules that are evaluated bottom-up. To emphasize 
this aspect, we employ a different notation for them. An active rule Lx is a rule 
of the form: Li A . . . A Z, (n > 0) where every Li {1 < i < n) is an atoms, 

and Z is a project from Lx. Active rules are rules that modify the current state 
when executed. Active rules take the form: action-name A Preconditions Z 
where actiomname is an abducible. If the Preconditions of the rule are satisfied, 
then the project (fluent) Z can be selected and executed. The head of an active 
rule must be a project that is either internal or external. An internal project 
operates on the state of the agent, e.g., if an agent gets an observation, then it 
updates its knowledge, or if some conditions are met, then it proves some goal, 
etc. External projects instead are performed on the environment, e.g., when an 
agent sends a message to another agent. 

Given a set of vertices V, we assume that for every project i:C in /C, C is either 
a generalized rule, an integrity constraint, an active rule, a query or an atom 
of the form modify -edge{j,u,v,x), not modify-edge{j,u,v,x), not edge{u,v), or 
edge{u, v) where u,v G V. Thus, a project can only take one of the following 

^ For further motivation and intuitive reading of logic programs with default negations 
in the heads see [1]. 
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form: 

v.{Lq ^ Li a ... a L„) L(Li A . . . A Ln => Z) 

i:{false ^ L\ A ... A Ln A Z\ A ... A Zm) i:{7—Li A ... A Ln) 
v.modify-edge{j , u, v, x) i:edge{u, v) 

i'.not modify -edg e{j , u, v, x) i: not edge{u, v) 

Note that the predicates edge and modify^edge can only occur inside projects or 
updates since those predicates do not belong to Ljc- We assume that i:edge{u, v) 
and i:(?— Li, . . . , Ln) can only be internal projects, that is, only the agent it- 
self can issue a project to update its own DAG (i.e., edge{u,v)) and goals (i.e., 
?— Li, . . . ,Ln). The project i:edge{u,v) issued by the agent i denotes the inten- 
tion of i to modify its own DAG by establishing an edge between the vertices u 
and V in V. By issuing a project v.modify-edge{j , u, v, x) the agent i expresses the 
intention to modify the DAG of another agent j by adding/deleting (depending 
on whether x = a or x = d) an edge between u and v. 



3 Agents and Multi-agent Systems 

This section presents the notion of agent and multi-agent system. The initial 
theory of an agent is characterized by a multi-dimensional abductive logic pro- 
gram, inspired by [12], which expresses the agent’s viewpoint on the relationships 
amongst a collection of agents, and encoded in a directed acyclic graph. 

Definition 1. Let Cjc be a propositional language. Let Ad be a multi-agent 
system (defined below) and a an agent in At. A multi-dimensional abductive 
logic program for a, written as is a tuple T = {D,Vd,A,TZd) where: 

— D = {V, E) is an acyclic directed graph, V = {v \ for every Ey G M} U {a'}; 

— Vd = {Pv I c G G} is a set of generalized logic programs in the language 
C]c, indexed by the vertices v GV of D; 

— A is a set of atoms; 

— TZd = {Rv I G G} is a set of sets of active rules in the language £jc, indexed 
by the vertices v € V of D. 

We call the distinguished vertex a' G V the inspection point of agent a, and 
we call the atoms in A abducibles. 

The initial theory of an agent a is determined (1) by a DAG D that represents 
the relation between the agents (represented by the vertices in V) in the multi- 
agent system; (2) by a set of generalized logic programs, one program Py for 
each agent v G V] (3) by a set of abducibles; and (4) by a set of sets of active 
rules Ry, for each agent v GV. 

Without loss of generality, we can assume that abducibles have no matching 
clauses in P = [Jy^y Py. Abducibles can be thought of as hypotheses that can be 
used to extend the given logic program in order to provide an “explanation” , or 
conditional answer, to a query. Explanations are required to satisfy the integrity 
constraints in P. 
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The multi-dimensional abductive logic program formalizes the initial knowl- 
edge state of the agent and its reactive behaviour. The knowledge of an agent 
can dynamically evolve when the agent receives new knowledge, albeit by self- 
updating rules. This is represented in the form of an updating program. 

Definition 2. Let Ad be a multi-agent system (defined below). An updating 
program U is a finite set of updates such that if v^C G U then Wy G M. 

An updating program contains the updates that will be performed on the current 
knowledge state of the agent. To characterize the evolution of the knowledge of 
an agent we need to introduce the notion of sequence of updating programs. 
Let S = {0,1,..., m} be a set of natural numbers. We call the elements s G S 
states. A sequence of updating programs U = {U^ \ s G S and s > 0} is a set of 
updating programs superscripted by the states s G S. 

Definition 3. Let s € S' be a state. An agent a at state s, written as is a 
pair (fT ,IA), where T is the multi-dimensional abductive logic program of a, and 
U = {[/^, . . . , U^} is a sequence of updating programs. If s = 0, then U = {}. 

An agent a at state 0 is defined by its initial theory and an empty sequence of 
updating programs, that is = (T, {}). At state 1, a is defined by (T, 
where is the updating program containing all the updates that a has received 
at state 1, either from other agents or as self-updates. In general, an agent a 
at state s is defined by 'f'® = (T, {U^ , . . . , J7®}), where each W is the updating 
program containing the updates that a has received at state i. 

Definition 4. A multi-agent system Ad = at state s is a set 

of agents vi, ... ,Vn each of which at state s. 

Note that the definition of multi-agent system characterizes a static society of 
agents in the sense that it is not possible to add/remove agents from the system, 
and all the agents are at one common state. 

To begin with, the system starts at state 0 where each agent is defined by 
S'q = {Ta,{})- Suppose that at state 0 an agent /3 wants to propose an update 
of the knowledge state of agent a with C by triggering the project a : C. Then, 
at the next common state a will receive the update j3 -G C indicating that an 
update has been proposed by (3. Thus, at state 1, a will be (7{,,{C/^}), where 

= 1/3 -7 C} if no other updates occur for a. 

Within logic programs we refer to agents by using the corresponding sub- 
script. For instance, if we want to express the update of the theory of an agent 
•Fq with C, we write the project a:C. 

The agent’s semantics is given by a syntactical transformation (defined in 
Sect. 3.2), producing a generalized logic program for each agent, and apply to 
queries for it as well. Nevertheless, to increase readability, we will immediately 
present a case study example that, though simple, allows one to glean some of the 
capabilities of the framework. To increase readability, we rely on natural language 
when explaining some details. The agent cycle, mentioned in the example and 
defined in Sect. 4, operates on the results of the transformation, but the example 
can be understood, on a first reading, without requiring detailed knowledge of 
the transformations. 
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Fig. 1. DAG of Alfredo at states 0 and 1 

3.1 Case Study 

This example shows how the DAG of an agent can be modified through updates, 
and how its knowledge state and reactions are affected by such modifications. 
With this aim, we hypothesize two alternative scenarios. Alfredo lives together 
with his parents, and he has a girlfriend. Being a conservative lad, Alfredo never 
does anything that contradicts those he considers to be higher authorities, in 
this case his mother, his father, and the judge who will appear later in the story. 
Furthermore, he considers the judge to be hierarchically superior to each of his 
parents. As for the relationship with his girlfriend, he hears her opinions but his 
own always prevail. Therefore Alfredo’s view of the relationships between himself 
and other agents can be represented by the following DAG D = {V, E), depicted 
in Fig. la) (where the inspection point of Alfredo, i.e. the vertex corresponding to 
Alfredo’s overall semantics, is represented by a bold circle): V = {a, /3, 7, /r, (p, a'} 
and E = {(7, a) , (a, p ) , (a, ip ) , (/i, /3 ) , (1^, j3 ) , (/3, a')} where a is Alfredo, (3 is 
the judge, 7 is the girlfriend, /i is the mother, p is the father and a' is the 
inspection point of Alfredo. 

Initially, Alfredo’s programs Pqq and Ra^ at state 0 contain the rules: 

rl : girlfriend ^ r3 : getjmarried A girlfriend 7 : propose 

r4 : not happy p : {1 —happy) r2 : move-Out a : rent-apartment 

r5 : not happy p : {1— happy) r6 : modify-edge{j3, u,v,a) a : edge{u, v) 

stating that: Alfredo has a girlfriend (rl); if he decides to move out, he has to 
rent an apartment (r2); if he decides to get married, provided he has a girlfriend, 
he has to propose (r3) ; if he is not happy he will ask his parents how to be happy 
(r4, r5); Alfredo must create a new edge between two agents (represented by u 
and v) in his DAG every time he proves the judge told him so (r6). Alfredo’s set 
of abducibles, corresponding to the actions he can decide to perform to reach his 
goals, is: A = {move-Out, getjmarried} . In the first scenario, Alfredo’s goal is to 
be happy, this being represented at the agent cycle level (cf. Sect. 4) together with 
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not false (to preclude the possibility of violation of integrity constraints) and with 
the conjunction of active rules: G = 1— {happy A not false A r2 A r3 A r4 A r5 A r6) 
During the first cycle (we will assume that there are always enough computa- 
tional units to complete the proof procedure), the projects ip : {1 — happy) and 
p : {1 — happy) are selected. Note that these correspond to the only two active 
rules (r4 and r5) whose premises are verified. Both projects are selected to be 
performed, producing 2 messages to agents tp and p (the reader is referred to [8] 
for details on how communication can be combined into this framework). 

In response to Alfredo’s request to his parents, Alfredo’s mother, as most latin 
mothers, tells him that if he wants to be happy he should move out, and that he 
should never move out without getting married. This correspond to the observed 
updates p -7 {happy ^ move-Out) and p -7 {false ^ move-Out A not geGmarried) 
which produce the program at state 1 containing the following rules: 

r7 : happy ^ movc-Out r8 : false ^ move-Out A not geGmarried 

Alfredo’s father, on the other hand, not very happy with his own marriage and 
with the fact that he had never lived alone, tells Alfredo that, to be happy 
he should move out and live by himself, i.e. he will not be happy if he gets 
married now. This corresponds to the observed updates tp^ {happy ^ move-Out) 
and (f -7 {not happy ^ get-married) which produce the program at state 1 
containing the following rules: 

r9 : happy ^ move-Out rlO : not happy ^ get-married 

The situation after these updates is represented in Fig. lb). 

After another cycle, the IFF proof procedure returns no projects because the 
goal is not reachable without producing a contradiction. From r7 and r8 one 
could abduce movc-Out to prove happy but, in order to satisfy r8, get -married 
would have to be abduced also, producing a contradiction via rlO. This is indeed 
so because, according to the DAG, the rules of and P^^ cannot reject one 
another other since there is no path from one to the other. Thus, in this scenario, 
Alfredo is not reactive at all being not able to take any decision. 

Consider now this second scenario where the initial theory of Alfredo is the 
same as in the first, but his parents have decided to get divorced. Suppose that 
at state 1 Alfredo receives from the judge the update l3^modify-edge{j3, p, p, a) 
corresponding to the decision to give Alfredo’s custody to his mother after his 
parents divorce. This produces the program Pp^ containing the rule: 

rll : modify-edge{(3,ip,p,a) ^ 

After this update, the projects ip : {1 — happy), p : {? — happy) and a : edge{tf, p) 
are selected for execution, which correspond to the evaluation of the active rules 
r4, r5 and r6. In response to Alfredo’s request to his parents, given the prece- 
dence of opinion imposed by the judge, suppose that they reply the same rules 
as in the first scenario, producing the following two programs: P^^ containing 
the rules received from the mother: 

rl2 : happy ^ move-Out rl3 : false ^ move-Out A not geGmarried 
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Fig. 2. DAG of Alfredo at state ^ 



and containing the rules received from the father: 

rl4 : happy ^ movc-Out rl5 : not happy ^ gePmarried 

The update aAedge{ip, yt) produces a change in the graph, and the current situ- 
ation is the one depicted in Fig. 2. Now, the proof procedure returns the projects 
a : renPapartment and 7 : propose. These projects correspond to the evaluation 
of rules r2 and r3 after the abduction of {move-Out, getjmarried} to prove the 
goal {1— happy). Note that now it is possible to achieve this goal without reach- 
ing a contradictory state since henceforth the advice from Alfredo’s mother (rl 2 
and rl3) prevails over and rejects that of his father (rl4 and rl5). To this second 
scenario, Alfredo gets married, rents an apartment, moves out with his new wife 
and lives happily ever after. 

3.2 Syntactical Transformation 

The semantics of an agent a at state m, 'Pfff, is established by a syntactical trans- 
formation (called multi- dimensional dynamic program update 0 ) that, given 
d/fff = {T,U), produces an abductive logic program [ 6 ] (P,A,R), where P is a 
normal logic program (that is, default atoms can only occur in bodies of rules), 
A is a set of abducibles and i? is a set of active rules. Default negation can then 
be removed from the bodies of rules in P via the dual transformation defined by 
Alferes et al [3]. The transformed program P' is a definite logic program. Con- 
sequently, a version of the IFF proof procedure proposed by Kowalski et al [10] 
and simplified by Dell’Acqua et al [ 6 ] to take into consideration propositional 
definite logic programs, can then be applied to (P',A,R). 

In the following, to keep notation simple we write rules as A ^ B A not C, 
rather than as A ^ PI A ... A Bm A not Cl A ... A not Cn. 

Suppose that {D,Vd,A,Rd), with D = (V,E), is the multi-dimensional 
abductive logic program of agent a, and S' = {1, ... , m} a set of states. Let the 
vertex a' & V he the inspection point of a. We will extend the DAG D with 
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two source vertices: an initial vertex sa for atoms and an initial vertex sp for 
projects. We will then extend D with a set of directed edges (soq, s') and {sps, s') 
connecting sa to all the sources s' in D at state 0, and connecting sp to all the 
sources s' in D at every state s € S. Let K. be an arbitrary set of propositional 
variables as described above, and K. the propositional language extending K. to: 

^ y / ’ reject{Ay _^ ) , 

\ reject{A~J | A G /C, v & V A {so, sp}, s G S 

f reLedge{us, Vs) ,reLpath{us, Vs) , reLvertex{us) , 
}edge{us,Vs),edge{us,Vs)~ | u, u G ^ U {so, sp}, s G S' 



Transformation of Rules and Updates 

(RPR) Rewritten program rules: Let 7 be a function defined as follows: 



l{v,s,F) 



' ^BAC~ 

Ap^^ ^ B AC~ 

^ falsep^ ^BAC~A 
AZl A Z2~ 
B A C~ =^Zp 

BAC-^Zp 



if F is A ^ i? A not C 

if F is not A ^ B A not C 

if F is false ^ B A not CA 
AZl A not Z2 
if F is S A not C ^ Z 

if F is F A not C not Z 



The rewritten rules are obtained from the original ones by replacing their heads A 
(respectively, the not A) by the atoms Ap^^ (respectively, Ap^ ) and by replacing 
negative premises not C by C~ . 

(RU) Rewritten updates: Let cr be a function defined as follows: 

cr(s,i-GC) = 7(t,s,C) 

where i^C is not one of the cases below. Note that updates of the form i-G(?— FiA 
. . . A Ln) are not treated here. They will be treated at the agent cycle level (see 
Section 4). The following cases characterize the DAG rewritten updates: 

<j{s,i^edge{u,v)) = edge{us,Vg) 

<r{s,i^not edge{u,v)) = edge{us,Vs)~ 

a(s, modify -edge{j, u, v, x)) = modify. edge{j, u, v, x) 

a(s, i-hnot modify.edge{j, u, v, x)) = modify .edge{j , u, v, x)~ 

(IR) Inheritance rules: 



Ay^ ^ Auj A not reject{Auf) A rel.edge{ut,Vs) 

A“ ^ A“^ A not reject{Aff) A rel-edge(ut,Vs) 

for all objective atoms A G /C, for all G VU {so} and for all s,t G S. The 
inheritance rules state that an atom A is true (resp., false) in a vertex v at state s 
if it is true (resp., false) in any ancestor vertex u at state t and it is not rejected, 
i.e., forced to be false. 
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(RR) Rejection Rules: 

reject{A~^) ^ A reljpath{us,vt) reject{Au^) ^ Aj>^ A reLpath{us,Vt) 

for all objective atoms A G K., for all S VU {so} and for all s,t & S. The 
rejection rules say that if an atom A is true (respectively, false) in the program 
Py at state t, then it rejects inheritance of any false (respectively, true) atoms 
of any ancestor u at any state s. 

(UR) Update Rules: 



Ar 



A: 




for all objective atoms A e /C, for all u S U U {so} and for all s G S. The update 
rules state that an atom A must be true (respectively, false) in the vertex v at 
state s if it is true (respectively, false) in the program Py at state s. 

(DR) Default Rules: 

A- 

sag 

for all objective atoms A G K.. Default rules describe the initial vertex sa for 
atoms (at state 0) by making all objective atoms initially false. 

(CSR^,^) Current State Rules: 



A ^ Ay^ A ^ Ay^ 

for every objective atoms A G 1C. Current state rules specify the current vertex 
V and state s in which the program is being evaluated and determine the values 
of the atoms A and A~ . 

Transformation of projects 
(PPR) Project Propagation rules: 



Zy^ A not rejectP(Zu^) A rePedge{us, Vg) Zy^ 

Z~^ A not rejectP{Z~J A rePedge{ug, Vg) Z~^ 

for all projects Z G 1C, for all G V U {sp} and for all s G S'. The project 
propagation rules propagate the validity of a project Z through the DAG at 
a given state s. In contrast to inheritance rules, the validity of projects is not 
propagated along states. 

(PRR) Project Rejection Rules: 



rejectP{Zy^) ^ A rePpath{ug,Vg) 
rejectP(Zy^) ^ Zp^ A rePpath{ug,Vg) 

for all projects Z G 1C, for u,v G V U {sp} and for all s G S. The project 
rejection rules state that if a project Z is true (respectively, false) in the program 
Py at state s, then it rejects propagation of any false (respectively, true) project 
of any ancestor u at state s. 

(PUR) Project Update Rules: 



^ Zv, 
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for all projects Z & JC, for &\\v & VU {sp} and for all s G S. The project update 
rules declare that a project Z must be true (respectively, false) in v at state s if 
it is true (respectively, false) in the program at state s. 

(PDR) Project Default Rules: 

Z 

for all projects Z G K. and for all s G S. Project default rules describe the initial 
vertex sp for projects at every state s G S' by making all projects initially false. 

(PAR^^) Project Adoption Rules: 

Zv, ^ z 

for every project Z G 1C. These rules, only applying to positive projects, specify 
the current vertex v and state s in which the project is being evaluated. Projects 
evaluated to true are selected at the agent cycle level and executed. 

Transformation rules for edge 
(ER) Edge Rules: 

edge{uo,Vo) 

for all {u, v) G E. Edge rules describe the edges of D at state 0. Plus the following 
rules that characterize the edges of the initial vertices for atoms (sa) and projects 
(sp): edge(sao,uo) and edge(sps,Ug) for all sources uGV and for all s G S. 

(EIR) Edge Inheritance rules: 

edge(us+i,Vs+i) ^ edge(us,Vs) A not edge(us+i,Vs+i)~ 
edge(us+i,Vs+i)~ ^ edge(ug,Vs)~ A not edge(us+i,Vs+i) 
edge(us,Us+i) ^ 

for all u,v GV such that u is not the inspection point of the agent, and for all 
s G S. Note that EIR do not apply to edges containing the vertices sa and sp. 

(RER„^) Current State Relevancy Edge Rules: 

rel.edge(X, Vg) <— edge(X, Vg) rePedge(X, Y) ^ edge(X, Y) A rel 4 >ath(Y, Vg) 

Current state relevancy edge rules define which edges are in the relevancy graph 
wrt. the vertex v at state s. 

(RPR) Relevancy Path Rules: 

rel-path(X,Y) ^ rel-edge(X,Y) 

rePpath(X, Z) ^ reLedge(X, Y) A reLpath(Y, Z). 

Relevancy path rules define the notion of relevancy path in a graph. 

3.3 Multi-dimensional Updates for Agents 

This section introduces the notion of multi-dimensional dynamic program up- 
date, a transformation that, given an agent W™ = (T,U), produces an abductive 
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logic program (as defined in [6]). Note first that an updating program U for an 
agent a can contain updates of the form a-L(?— Li A ... A L„). Such updates 
are intended to add Li A ... A L„ to the current query of the agent a. Thus, 
those updates have to be treated differently. To achieve this, we introduce a 
function /3 defined over updating programs. Let U be an updating program and 
a an agent in Af. If C/ does not contain any update starting with “a-L(?— ”, 
then, (3{a,U) = {true,U). Otherwise, suppose that U contains n updates start- 
ing with “a-L(?— ”, say a-L(?— Ci), . . . , a-L(?— C„) for n conjunctions Ci, . . . ,Cn 
of atoms. Then, (3{a, U) = (Ci A ... A Cn, U — {a-L(?— Ci), . . . , a-L(?— Cn)}). 
Given a set Q of rules, integrity constraints and active rules, we write and 
Q 2 to indicate: Q\{Q) = {C \ for every rule and integrity constraint C in Q} 
and 172(Q) = | for every active rule R in Q}. Follows the notion of multi- 

dimensional dynamic program updates for agents. 

Definition 5. Let m G S' be a state and S'™ = {T,U) an agent a at state 
m. Let T = {D,Vd,A,TZd) and U = {Ug | s G S and s > 0}. The multi- 
dimensional dynamic program update for agent a at the state m is 
0 = (P', A, R') where: P' = 7 ( 1 ;, 0, P„) U Us6s('^i (Qs)) U IRU RRU 

URUDRU CSR{a'^) U PRR U PDR U PP U EIR U RER{a'^) U RPR and R' = 
Ut,ey 0: Rv) U Uses('^ 2 (Qs)) U PPR U PUR U PAR{a'^), where P{a, Ug) = 
(_,Ps), and a{s,Pg) = Qg, for every s G S. 

Note that P' is a normal logic program, that is, default atoms can only occur in 
bodies of clauses. 

4 The Agent Cycle 

In this section we briefly sketch the behaviour of an agent. We omit the details 
to keep the description general and abstract. Every agent can be thought of 
as a multi-dimensional abductive logic program equipped with a set of inputs 
represented as updates. The abducibles are (names of) actions to be executed 
as well as explanations of observations made. Updates can be used to solve the 
goals of the agent as well as to trigger new goals. The basic “engine” of the 
agent is the IFF proof procedure [10,6], executed via the cycle represented in 
Fig. 3. Assume that /3{a,Ug) = (Pi A ... A L„,Pg), a{s,Pg) = Qg and = 
(T,{Pi,...,P,_i}). 

The cycle of an agent a starts at state s by observing any inputs (projects 
from other agents) from the environment (step 1), and by recording them in 
the form of updates in the updating program Ug. Then, the proof procedure is 
applied for r units of time (steps 2 and 3) with respect to the abductive logic 
program 0 P® obtained by updating the current one with the updating program 
Ug. The new goal is obtained by adding the goals not false A Li A . . . A P„ and the 
active rules in 172 (Qs) received from Ug to the current goal G. The amount of 
resources available in steps 2 and 3 is bounded by some amount r. By decreasing 
r the agent is more reactive, by increasing r the agent is more rational. 

Afterwards (steps 4 and 5), the selectable projects are selected from G' , the 
last formula of the derivation, and executed under the control of an abduc- 
tive logic programming proof procedure such as ABDUAL in [3] . If the selected 
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Cycle(g,r,s,»?'^ ^ ,G) 

1. Observe and record any input in the updating program U 3 . 

2. Resume the IFF procedure by propagating the inputs wrt. the 

program 0 and the goal G A not false A Li A . . . A Ln A l?2(Qs)- 

3. Continue applying the IFF procedure, using for steps 2 and 3 a 

total of r units of time. Let G' be the last formula in this derivation. 

4. Select from G' the projects that can be executed. 

5. Execute the selected projects. 

6. Cycle with {a,r,s -\- 1,^“,G'). 



Fig. 3. The agent eycle 

project takes the form j : C (meaning that agent a intends to update the theory 
of agent j with C at state s), then (once executed) the update a -b C will be 
available (at the next cycle) in the updating program of j. Selected projects can 
be thought of as outputs into the environment, and observations as inputs from 
the environment. From every agent’s viewpoint, the environment contains all 
other agents. Every disjunct in the formulae derived from the goal G represents 
an intention, i.e., a (possibly partial) plan executed in stages. A sensible action 
selection strategy may select actions from the same disjunct (intention) at dif- 
ferent iterations of the cycle. Failure of a selected plan is obtained via logical 
simplification, after having propagated false into the selected disjunct. 

Integrity constraints provide a mechanism for constraining explanations and 
plans, and action rules for allowing a condition-action type of behaviour. 

5 Conclusions and Future Work 

Throughout this paper we have extended the Multi- dimensional Dynamic Logic 
Programming framework to include integrity constraints and active rules, as 
well as to allow the associated acyclic directed graph to be updatable. We have 
shown how to express, in a multi-agent system, each agent’s viewpoint regarding 
its place in its perception of others. This is achieved by composing agents’ view of 
one another in acyclic directed graphs, one for each agent, which can evolve over 
time by updating its edges. Agents themselves have their knowledge expressed by 
abductive generalized logic programs with integrity constraints and active rules. 
They are kept active through an observe-think-act cycle, and communicate and 
evolve by realizing their projects to do so, in the form of updates acting on other 
agents or themselves. Projects can be knowledge rules, active rules, integrity 
constraints, or abductive queries. 

With respect to future work, we are following two lines of research. At the 
agent level, we are investigating how to combine logical theories of agents ex- 
pressed over graph structures [13], and how to express preferences among the 
beliefs of agents [7]. At the multi-agent system level, we are investigating into 
the representation and evolution of dynamic societies of agents. A first step to- 
wards this goal is to allow the set of vertices of the acyclic directed graph to be 
also updatable, representing the birth (and death) of the society’s agents. Our 
ongoing work and future projects is discussed in [2]. 
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1 Introduction 

In a previous paper [5] we presented a combination of the dynamic logic program- 
ming paradigm proposed by J. J. Alferes et al. [1,10] and a version of KS-agents 
proposed by Kowalski and Sadri [7]. In the resulting framework, rational, reactive 
agents can dynamically change their own knowledge bases as well as their own 
goals. In particular, at every iteration of an observe-think-act cycle, the agent 
can make observations, learn new facts and new rules from the environment, 
and then it can update its knowledge accordingly. The agent can also receive a 
piece of information that contrasts with its knowledge. To solve eventual cases of 
contradiction within the theory of an agent, techniques of contradiction removal 
and preferences among several sources can be adopted [8]. The actions of an 
agent are modeled by means of updates, inspired by the approach in [3]. A se- 
mantic characterization of updates is given in [1] as a generalization of the stable 
model semantics of normal logic programs [6]. Such a semantics is generalized 
to the three- valued case in [3], which enable us to update programs under the 
well-founded semantics. 

J. J. Alferes and L. M. Pereira [2] propose a logic programming framework 
that combines two distinct forms of reasoning: preferring and updating. In par- 
ticular, they define a language capable of considering sequences of logic programs 
that result from the consecutive updates of an initial program, where it is pos- 
sible to define a priority relation among the rules of all successive programs. 
Updating and preferring complement each other in the sense that updates cre- 
ate new models, while preferences allow us to select among pre-existing models. 
Moreover, within the framework, the priority relation can itself be updated. In 
[2] the authors also define a declarative semantics for such a language. 

In this paper we set forth an extension of the language introduced in [5] to 
allow agents to update their knowledge and to prefer among alternative choices. 
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In the resulting framework agents can, not only update their own knowledge, 
for example on advice from some other agent, but can also express preferences 
about their own rules. Preferences will allow an agent to select among alternative 
models of its knowledge base. In our framework, preferences can also be updated, 
possibly on advice from others. We exhibit examples to show how our approach 
functions, including how preferring can enhance choice of reactivity in agents. 
The declarative semantics of this approach to agents can be found in [4] . 

In the remaining of the paper, we will use the following example (taken from 
[2]) as a working example. This example, where rules as well as preferences 
change over time, shows the need to combine preferences and updates in agents, 
including the updating of preferences themselves. 

Happy story. (0) In the initial situation Maria is living and working everyday 
in the city. (1) Next, as Maria have received some money, Maria conjures up 
other, alternative but more costly, living scenarios, namely traveling, settling up 
on a mountain, or living by the beach. And, to go with these, also the attending 
preferences, but still in keeping with the work context, namely that the city is 
better for that purpose than any of the new scenarios, which are otherwise in- 
comparable amongst themselves. (2) Consequently, Maria decides to quit working 
and go on vacation, supported by her increased wealth, and hence to define her 
vacation priorities. To wit, the mountain and the beach are each preferable to 
travel, which in turn gainsays the city. (3) Next, Maria realizes her preferences 
keep her all the while undecided between the mountain and the beach, and advised 
by a friend opts for the former. 

To keep the notation short, we follow some abbreviations: we let c stand for 
“living in the city” , mt for “settling on a mountain” , b for “living by the beach” , 
t for “traveling” , wk for “work” , vac for “vacations” , and mo for “possessing 
money” . 



2 Enabling Agents to Update Their Knowledge 

This section introduces the language and concepts that allow agents to update 
their knowledge. Typically, an agent can hold positive and negative information. 
An agent, for example, looking at the sky, can observe that it is snowing. Then, 
after a while, when weather conditions change, the agent observes that it is not 
snowing anymore. Therefore, he has to update his own knowledge with respect 
to the new observed information. This implies that the language of an agent 
should be expressive enough to represent both positive and negative information. 
Knowledge updating refers not only to facts, as above, but also to rules which 
can also be overridden by newer rules which negate previous conclusions. 

In order to represent negative information in logic programs, we need more 
general logic programs that allow default negation not A not only in premises of 
their clauses but also in their heads^. We call such programs generalized logic 
programs. It is convenient to syntactically represent generalized logic programs 

^ For further motivation and intuitive reading of logic programs with default negations 
in the heads see [1]. 
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as propositional Horn theories. In particular, we represent default negation not A 
as a standard propositional variable. Propositional variables whose names do 
not begin with “not” and do not contain the symbols and “-h” are called 
objective atoms. Propositional variables of the form not A are called default 
atoms. Objective atoms and default atoms are generically called atoms. 

Agents interact by exchanging information: for instance, when an agent /3 
communicates C to an agent a. Then, a may or may not believe C. In case a 
does believe it, a has to update its knowledge accordingly. This kind of agent 
interaction is formalized via the concepts of project and update. Propositional 
variables of the form a\C (where C is defined below) are called projects. a:C 
denotes the intention (of some agent /?) of proposing the updating the theory of 
agent a with C. We assume that projects cannot be negated. 

Propositional variables of the form (i ^ C are called updates. [3 ^ C denotes 
an update that has been proposed by j3 of the current theory (of some agent a) 
with C. We assume that updates can be negated. A negated update not /3-bC 
in the theory of an agent a indicates that agent (3 did not have the intention of 
proposing the updating the theory of agent a with C. 

Definition 1. A generalized rule is a rule of the form Lq ^ Li A . . . A 
(n > 0), where Lq (Lq ^ false) is an atom and every Li (\ < i < n) is an atom, 
an update or a negated update. 

Note that, according to the definition above, only objective atoms and default 
atoms can occur in the head of generalized rules. 

Definition 2. An integrity constraint is a rule of the form false <— Li A . . . A 
Ln l\ Zi A . . . A Zjn (n > 0,m > 0), where every Li (1 < i < n) is an atom, an 
update or a negated update, and every Zj (1 < j < m) is a project. 

Integrity constraints are rules that enforce some condition over the state, and 
therefore always take the form of denials (without loss of generality) in a 2- valued 
semantics. Note that generalized rules are distinct from integrity constraints 
and should not be reduced to them. In fact, in generalized rules it is of crucial 
importance which atom occurs in the head. 

Definition 3. A generalized logic program P is a set of generalized rules and 
integrity constraints. 

Definition 4. A query Q takes the form 7— Li A ... A L„ (n> 1), where every 
Li (1 < i < n) is an atom, an update or a negated update. 

Reactivity is a useful ability that agents must exhibit, e.g., if something hap- 
pens, then the agent needs to perform some action. Agent’s reactive behaviour 
is formalized via the notion of active rules. As active rules will be evaluated 
bottom-up, we employ a different notation for them to emphasize this aspect. 

Definition 5. An active rule is a rule of the form Li A . . . A Ln => Z (n>0), 
where every Li (1 < i < n) is an atom, an update or a negated update, and Z is 
a project. 
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Active rules are rules that can modify the current state, to produce a new state, 
when triggered. If the body Li A . . . A of the active rule is satisfied, then 
the project (fiuent) Z can be selected and executed. The head of an active 
rule must be a project that is either internal or external. An internal project 
operates on the state of the agent itself, e.g., if an agent gets an observation, 
then it updates its knowledge, or if some conditions are met, then it executes 
some goal. External projects instead operate on the state of other agents, e.g., 
when an agent a proposes to update the theory of another agent (3. 

Example 1 . Suppose that the underlying theory of Maria (represented with m) 
contains the following active rules: 

mo m-.not wk 
h m'.goToBeach 
t p-.bookTravel 

The heads of the first two active rules are project internal to Maria. The first 
states that if Maria has money, then she wants to update her own theory with 
not wk. The head of the last active rule is an external project: if Maria wants to 
travel, she proposes to book a travel to Pedro (represented with p) . 

We assume that for every project a:C, C is either a generalized rule, an integrity 
constraint, an active rule or a query. Thus, a project can only take one of the 
following forms: 

0!'{Lo <— Li a ... a Ln) 

a:{false ^ L\ f\ ... f\ Ln f\ Z\ f\ ... f\ Zm) 
a:{Li A ... A L„ Z) 
a:(?-Li A...AL„) 

Note that projects can only occur in the head of active rules and in the body 
of integrity constraints. Updates and negated updates can occur in the body 
of generalized rules and in the body of integrity constraints. For example, the 
integrity constraint false <— A A f3:B in the theory of an agent a enforces the 
condition that a cannot perform a project f3:B when A holds. The active rule 
A A notfi^B j3:C in the theory of a states to perform project f3:C if A holds 
and a has not been proposed (by /?) to update its theory with B. 

When an agent a receives an update (3 -h- C , a may or may not believe C 
depending on whether a trusts f3. This behaviour, characterized at the semantic 
level in [4] , can be understood with an axiom schema of the form: 

C ■<— P ^ C A not distrust{P -L C) 

in the theory of a. 

3 Enabling Agents to Prefer 

Preferring is an important ability that agents must exhibit to select among 
alternative choices. Consider for instance the following example. 
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Example 2. Let Q be the underlying theory of Maria. 

c ^ not mt A noth A nott (1) 

wk (2) 

vac ^ not wk (3) 

mt ^ not c A not b A not t A mo (4) 

b ^ not c A not mt A not t A mo (5) 

t ^ not c A not mt A not b A mo (6) 

As Q has a unique 2- valued model {c, wk}, Maria decides to live in the city (c). 
Things change if we add the rule mo (indicating that Maria possesses money) 
to Q. In this case, Q has four models: {c, mo, wk}, {mt, mo, wk}, {b, mo, wk} and 
{t, mo, wk}, and thus c is no longer true. Maria is now unable to decide where 
to live. 

To incorporate the ability of preferring, we extend the agent language to make 
it possible to express preferences among the rules of the agent itself. Let < be a 
binary predicate symbol whose set of constants includes all the generalized rules. 

Definition 6. A priority rule is a generalized rule defining the predicate sym- 
bol <. 




It is assumed that the set of constants of < does not include < itself. ri < r 2 
means that rule ri is preferred to rule r 2 - 

Definition 7. A prioritized logic program P is a generalized logic program pos- 
sibly containing priority rules. 

The definition of prioritized logic program generalizes the one given in [5] to allow 
for priority rules. The intuition is that the program P formalizes the theory of 
an agent, and the priority rules express preferences among rules in P. 

Example 3. Let P be the following prioritized logic program: 



P = QLI {mo} U 



1 


< 


4 ^ 


- wk 


1 


< 


5 ^ 


- wk 


1 


< 


6 ^ 


- wk 


4 


< 


6 ^ 


- vac 


5 


< 


6 ^ 


- vac 


6 


< 


1 ^ 


- vac 



For simplicity, we adopt unique numbers for rules, instead of the rules themselves, 
in the priority relation <. The first three priority rules in P say that rule 1 is 
preferable to the rules 4, 5 and 6 if wk holds. If vac holds, then the rules 4 
and 5 are both preferable to rule 6 which in turn is preferable to rule 1. As wk 
holds while vac does not, the first three priority rules are used to characterise 
the preferred model: {c,mo,wk}. Thus, by reducing the number of models of 
a program, priority rules allow us to select among several alternative models. 
Not always the priority rules allow us to select exactly one model. This may 
happen when the priority rules do not specify a complete linear order among the 
alternative choices. Suppose for instance that wk is false, then vac would hold 
and P would have two models: {mt, mo, wk} and {b, mo, wk}. 
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As priority rules are generalized rules, preferences can also be updated. For 
example, we can update the priority rules of the program P in the previous 
example by updating with the rule 4<5<— vac. In this case, P will have a unique 
model (also in case vac holds). 

4 Agents and Mnlti-agent Systems 

This section presents the conception of agent and of multi-agent system. The 
initial knowledge of an agent is modelled by the notion of initial theory. 

Definition 8. The initial theory T of an agent a is a pair (P, R), where P is a 
prioritized logic program and R is a set of active rules. 

P formalizes the initial knowledge state of the agent, and R characterizes its 
reactive behaviour. The knowledge of an agent can dynamically evolve when the 
agent receives new knowledge, albeit by self-updating rules. The new knowledge 
is represented in the form of an updating program. 

Definition 9. Let Ai be a multi-agent system (defined below). An updating pro- 
gram U is a finite set of updates such that if an update a^C G U then Ta is an 
agent of M.. 

An updating program contains the updates that will be performed on the current 
knowledge state of the agent. To characterize the evolution of the knowledge of 
an agent we need to introduce the notion of sequence of updating programs. 
Let S = {0,1,..., m} be a set of natural numbers. We call the elements s G S' 
states. A sequence of updating programs U = {U^ \ s G S and s > 0} is a set of 
updating programs superscripted by the states s G S. 

Definition 10. Let s G S be a state. An agent a at state s, written as is 
a pair {T,U), where T is the initial theory of a, and U = {P^, . . . , [/*} is a 
sequence of updating programs. Lf s = 0, then U = {}. 

An agent a at state 0 is defined by its initial theory and an empty sequence of 
updating programs, that is = (T, {}). At state 1, a is defined by (T, {17^}), 
where is the updating program containing all the updates that a has received 
at state 1, either from other agents or as self-updates. In general, an agent a 
at state s is defined by iF® = (T, {U^ , . . . , 17®}), where each 17* is the updating 
program containing the updates that a has received at state i. 

Definition 11. A multi-agent system M = {’F®^, . . . at state s is a set of 

agents oi, . . . , «„ each of which at state s. 

Note that the definition of multi-agent system characterizes a static society of 
agents in the sense that it is not possible to add/remove agents from the system, 
and all the agents are at one common state. 

To begin with, the system starts at state 0 where each agent is defined by 
T'd = {Ta,{}). Suppose that at state 0 an agent (3 proposes an update of the 
knowledge state of agent a with C by triggering the project a : C. Then, at the 
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next common state a will receive the update (3-^C indicating that an update has 
been proposed by /3. Thus, at state 1, a will be (7^, {C/^}), where = {/3-h C} 
if no other updates occur for a. 

Example 4- This example formalizes the happy story as described in Section 1. 
Let the initial theory T = {P, R) of Maria be: 

{ mo m:not wk 
b m:goToBeach 
mt m'.goToMountains 

c <— not mt A not b A not t 
wk 

vac <— not wk 

mt ^ not c A not b A not t A mo 
b ^ not c A not mt A not t A mo 
t ^ not c A not mt A not b A mo 
1 < 4: ^ wk 
1 < 5 ^ wk 
1 < 6 <— wfc 
4 < 6 <— vac 
5 < 6 <— vac 
6 < 1 ^ vac 

At state 0, suppose that Maria receives the information from some agent I that 
she has received some money, that is, = {l^mo}. Thus, as Maria does not 
distrust agent I, at state 1 she believes mo and therefore she proposes the internal 
update m:not wk by triggering the first active rule in R. At state 2 Maria does 
not work and therefore decides to take a vacation. Unfortunately, the theory of 
Maria has two models: 

{mt, vac, mo, 4<6,4< 1,5<6,5< 1,6< 1} 

{b, vac, mo, 4<6,4< 1,5<6,5< 1,6< 1} 

and therefore Maria is unable to take any action neither living by the beach nor 
settling up on a mountain. Suppose now that Maria is advised by a friend / 
to prefer mountains to beaches, that is, U'^ = {/-b(4<5^?;ac)}. At state 3 the 
theory of Maria has a single preferred model: 

{mt, vac, mo, 4<5,4<6,4<1,5<6, 5<1,6<1} 
and Maria can happily settle up on the mountains. 

5 Conclusion and Future Work 

We have presented a logical framework of a multi-agent system in which each 
agent can communicate with and update other agents, can react, and is able to 
prefer. In the near future, we plan to develop a proof procedure for updating and 
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preferring reasoning and to prove it correct and complete with respect to the 
declarative semantics presented in [4]. We are also investigating possible gener- 
alizations to the multi-agent system to make it not synchronous, and dynamic 
in a way that the agents can enter and leave the system. 
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Abstract. Organisations can be defined as a set of entities regulated by 
mechanisms of social order and created by more or less autonomous actors to 
achieve common goals. Multi-agent systems are a natural choice to design 
organisational systems due to the proactive and autonomous behaviour of 
agents. However, in business environments it is necessary to consider the 
behaviour of the global system and the collective aspects of the domain. In this 
paper, we argue that multi-agent systems should be designed around 
organisational co-ordination frameworks that reflect the co-ordination 
structures of the particular organisation. As in human societies, we argue that 
norms and institutions are a way for agent societies to cope with the challenge 
of social order. Through institutions, conventions and interaction patterns for 
the co-ordination of agents can be specified, monitored and managed. 



Keywords: Agent societies, co-ordination, institutions, virtual organisations 



1. Introduction 

In an increasing number of domains, organisations need to work together in 
transactions, tasks or missions. Work relationships between people and enterprises are 
also shifting, from the ‘job-for-life’ paradigm to project-based virtual enterprises in 
which people and organisations become independent contractors. Furthermore, there 
is often a decentralised ownership of data, expertise, control and resources involved in 
business processes. Often, multiple, physically distributed organisations (or parts 
hereof) are involved in one business process. Each organisation, or part of an 
organisation, attempts to maximise its own profit within the overall activity. Different 
groups within organisations are relatively autonomous, in the sense that they control 
how their resources are created, managed or consumed, and by whom, at what cost, 
and in what time frame. There is a high degree of natural concurrency (many 
interrelated tasks and actors are working simultaneously at any given point of the 
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business process) which makes it imperative to be able to monitor and manage the 
overall business process (e.g. total time, total budget, etc.). The above considerations 
show an increasing need for transparency in the representation and implementation of 
business processes. However, the fact that business processes are highly dynamic and 
unpredictable makes it difficult to give a complete a priori specification of all the 
activities that need to be performed, which are their knowledge needs, and how they 
should be ordered. 

An organisation can be seen as a set of entities regulated by mechanisms of social 
order and created by more or less autonomous actors to achieve common goals. 
Because of the proactive and autonomous behaviour of agents it is natural to design 
organisation systems using agent societies that mimic the behaviour and structure of 
human organisations [22]. Agent societies represent the interactions between agents 
and are as such the virtual counterpart of real-life societies and organisations. Agents 
model specific roles in the society and interact with others as a means to accomplish 
their goals. This perspective makes the design of the system less complex since it 
reduces the conceptual distance between the system and the real-world application it 
has to model. Therefore, agent societies are an effective platform for virtual 
organisations because they provide mechanisms to allow organisations to advertise 
their capabilities, negotiate their terms, exchange rich information, and synchronise 
processes and workflow at a high-level of abstraction [18]. 

Business environments must consider the behaviour of the global system and be 
able to incorporate collective characteristics of an organisation such as stability over 
time, some level of predictability, and clear commitment to aims and strategies. 
However, typically, agents are assumed to pursue their own individual goals and 
global behaviour emerges from individual interactions. Existing architectures, 
behavioural strategies and models for group formation often assume this individualist 
perspective, which is not suitable for the representation of collective characteristics of 
an organisation. 

In this paper, we argue that multi-agent systems developed to model and support 
organisations must be based on co-ordination frameworks that mimic the structure of 
the particular organisation. Methodologies for designing such multi-agent systems 
have to be able to describe and apply different types of co-ordination models. As in 
human societies, we argue that norms and institutions are a way for agent societies to 
cope with the challenge of social order. Agents act autonomously according to their 
own goals and capabilities. Institutions are needed to enforce the global behaviour of 
the society and assure that the global goals of the society are met. Different co- 
ordination models have different needs in terms of how institutions can manage them 
and consequently which type of roles are present in the institution and which should 
be the capabilities of the agents fulfilling those roles. 

The paper is organised as follows. In section 2 we introduce a model for agent 
societies that is based on the structural characteristics of an organisation and 
supported by different co-ordination frameworks. The role of institutions in the 
engineering of agent societies is described in section 3. In section 4, the 
characteristics of the different frameworks are described in more detail. Practical 
applications of this model being developed at Achmea are described in section 5. 
Finally, in section 6 we present some conclusions and indicate directions for future 
work. 
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2. Organisational Multi-agent Systems 

There is a rising awareness that multi-agent systems and cyber-societies can best be 
understood and developed if they are inspired by human social phenomena [1, 5, 23]. 
Organisations can be seen as sets of entities regulated by mechanisms of social order 
and created by more or less autonomous actors to achieve common goals. Multi-agent 
systems that model and support organisations should therefore be based on co- 
ordination frameworks that mimic the structure of the particular organisation and be 
able to dynamically adapt to changes in organisation structure, aims and interactions. 
The structure of the organisation determines important autonomous activities that 
must be explicitly organised into autonomous entities and relationships in the 
conceptual model of the agent society [11]. 

In a business environment, the behaviour of the global system and the collective 
aspects of the domain, such as stability over time, predictability and commitment to 
aims and strategies, must be considered. Organisations are expected to form a 
coherent, stable system that realises the objectives for which it was designed. When 
multi-agent systems, or agent societies, are considered from an organisational point 
of view, the concept of desirable social behaviour becomes of utmost importance. 
That is, from the organisational point of view, the behaviour of individual agents in a 
society should be understood and described in relation to the social structure and 
overall objectives of the society. However, until recently, multi agent systems are 
mainly viewed from an individualistic perspective, that is, as aggregations of agents 
that interact with each other [13]. In this view looks at the behaviour of multi-agent 
systems from the perspective of the agent itself, in terms of how an agent can affect 
the environment or be affected by it. 

Open societies assume that participating agents are designed and developed outside 
the scope and design of the society itself and therefore the society cannot rely on the 
embedding of organisational and normative elements in the intentions, desires and 
beliefs of participating agents but must represent these elements explicitly. 

The above considerations lead to the following requirements for engineering 
methodologies for agent societies: 

Agent societies must include formalisms for the description, construction and 
control of the organisational and normative elements of a society (roles, norms 
and goals) instead of just agent states [1, 23]. 

The methodology must provide mechanisms to describe the environment of 
the society and the interactions between agents and the society, and to 
formalise the expected outcome of roles in order to verify the overall 
animation of the society. 

The organisational and normative elements of a society must be explicitly 
specified since an open society cannot rely on its embedding in the intentions, 
desires and beliefs of each agent [7, 17] 

Methods and tools are needed to verify whether the design of an agent society 
satisfies its design requirements and objectives [15]. 

The methodology should provide building directives concerning the 
communication capability and ability to conform to the expected role 
behaviour of agents participating in the society. 

One last point is that in order to facilitate the development of organisation oriented 
multi-agent systems it is important to relate to the organisational perception of the 
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domain. That is, a common ground of understanding must be found between agent 
engineers and organisational practitioners. In our opinion co-ordination is an ideal 
candidate. In one hand, organisational science and economics have since long 
researched co-ordination and organisational structures. Relationships between and 
within organisations are developed for the exchange of goods, resources, information 
and so on. Depending on transaction costs and interdependent relations, different co- 
ordination models (market, hierarchy or network) are possible. On the other hand, co- 
ordination is one of the cornerstones of agent societies and is considered an important 
problem inherent to the design and implementation of MAS [2]. However, the 
implications of the co-ordination model for the agent society architecture and design 
method have usually not been considered. So far, research about co-ordination in 
MAS has been mainly limited to the study of technical aspects of co-ordination, such 
as control and planning. In many cases the social organisation is left implicit in the 
design of the agent society. An agent society model that incorporates co-ordination 
issues related to the organisational perspective of the domain will thus facilitate the 
introduction of multi-agent systems in organisations. Co-ordination forms therefore 
the basis for the model for agent societies introduced in this paper. The following 
notions are core concepts in our model; 

Agents are the inhabitants of the agent society that interact with each other 
using the communication framework. Agents are designed outside the scope of 
the society, and may have their own goals and behaviour rules. Every agent 
within the society must adopt some role(s). 

Roles are patterns of behaviour. Roles are described in the society model in 
terms of externally perceived behaviour 

Rules or constraints describe the desired behaviour of agents in the society 
and its consequences in terms of sanctions, rewards and limitations. 
Communication framework describes the interaction between agents. It 
includes the description of the society ontology (vocabulary understood within 
the society), the communication language (intentions and utterances) and the 
representation language for domain content. 

Goals are the overall objectives of the society 
As described before, the design of organisation-oriented multi-agent systems must 
account for the representation and management of normative aspects of the society 
and incorporate collective characteristics of an organisation such as stability over 
time, some level of predictability, and clear commitment to aims and strategies. 
Human societies have successfully coped with similar issues through the use of 
institutions that monitor behaviour and enforce social laws. Therefore our agent 
society model consists of two layers. The institutional layer, or institution, provides 
the social and institutional backbone of the society and are the place where social 
norms and rules are explicitly specified. Institutional agent roles are designed to 
enforce the social behaviour of agents in the society and assure the achievement of 
global goals of the society. The operational layer models the overall objectives and 
intended action of the society and is therefore domain dependent. Interaction between 
agents in the operational level is not necessarily bound by the institution, and agents 
are free to act according to their own objectives. However, in order to join the society 
agents must commit themselves to the social rules described and enforced by the 
institution. 




Modelling Agent Societies; Co-ordination Frameworks and Institutions 



195 



3. The Role of Institutions 

Usually human organisations and societies use norms and conventions to cope with 
the challenge of social order. Norms and conventions specify the behaviour that 
society members are expected to conform to and are suitable for decentralised control. 
In most societies, norms are backed by a variety of social institutions that enforce law 
and order (e.g. courts, police), monitor for and respond to emergencies (e.g. 
ambulance system), prevent and recover from unanticipated disasters (e.g. coast 
guard, fire-fighters), etc. In this way civilised societies allow citizens to utilise 
relatively simple and efficient rules of behaviour, offloading the prevention and 
recovery of many problem types to social institutions that can handle them efficiently 
and effectively by virtue of their economies of scale and widely accepted legitimacy. 
Successful human institutions achieve sustainability of citizens and increase the 
welfare of the society as a whole. Several researchers have recognised that the design 
of agent societies can benefit from abstractions analogous to those employed by our 
robust and relatively successful societies and organisations. There is a growing body 
of work that touches upon the concepts of norms and institutions in the context of 
multi-agent systems (cf. [9, 10, 12]). 

The benefit of an institution resides in its potential to lend legitimacy and security 
to its members by establishing norms. The electronic counterpart of the physical 
institution does a similar task for software agents: it can engender trust through 
certification of an agent and by the guarantees that it provides to back collaboration. 
However, the electronic institution can also function as the independent place in 
which al types of agent independent information about the interaction between the 
agents within the society is stored. E.g. it defines the message types that can be used 
by the agents in their interactions, the rules of encounter, etc. In general, institutions 
enable to: 

Specify the co-ordination structure that is used 

Describe exchange mechanisms of the agent society 

Determine interaction and communication forms within the agent society 

Facilitate the perception of individual agents of the aims and norms of an 

agent society 

Enforce the organisational aims of the agent society 

In our approach we consider that an agent society consists of two layers: one is 
facilitation-oriented and the other goal-oriented. The institution acts as mediator and 
animator for the members, who bring various skills and services, and customers (or 
groups of customers) who bring their problems and requirements. The most important 
service the institution provides is to regulate the interaction between members. 
Because the way interaction between agents happens depends on the co-ordination 
model, institutions will need to be defined differently for each co-ordination model. 

We have shown above that co-ordination models provide a setting for agent 
societies by setting out the goals of the society and the roles (what you can do) need 
to achieve those goals. Institutions will enforce this model by setting out the scenes 
(where you can do it) and protocols (what you can say) for interaction in the society. 
This defines how agents can interact with the institution or with other agents in the 
society. The whole point of institutions is for the additional services it can provide 
and the trust and guarantees that are established through the institution's credibility 
and norms. 
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Looking at the structure of organisations we can anticipate the types of interaction 
involved in interacting in a particular co-ordination model. Thus, an institution 
defines a performative structure and a dialogical framework, by which we mean, it 
prescribes the actions members can take and when and where to perform those 
actions, and determines the form of conversations between members. Therefore, the 
way norms and conventions are specified and enforced in a society depends on the co- 
ordination model. In hierarchies, norms and conventions can be embedded in the 
power relations. These relations determine which agent can demand an action from 
which other agent or which agent has priority over the resources. The controlling 
agent is supposed to uphold the norms of the society by managing the sub-ordinate 
agents according to them. In markets, norms and conventions are for a large part 
embedded in the market mechanism chosen. E.g. the auction mechanisms try to 
ensure that all agents get an opportunity to require a resource relative to their private 
value for that resource. Cheating by over- or underbidding does not lead to any 
benefit for the agent and thus is prevented by the mechanism itself. In network models 
explicit roles are defined to ‘represent’ the institutions that enforce monitoring and 
trust, and trace the fulfilment of contracts. Some examples of these roles will be given 
in the next section. 



4. Co-ordination Models 



We identify three basic co-ordination types of agent societies following on the 
classification of organisations used in organisational theory. Hence, co-ordination of 
agent societies follows a market, network or hierarchy model. Each co-ordination 
model determines a different framework for agent societies that describe the 
institutional layer of the society. The institutional layer must describe institutional 
roles, the way interactions between roles are organised and the way the interface 
between the society and the ‘outside world’ is defined. That is, the co-ordination 
model determines the institutional roles, social norms and interaction forms in the 
society. 

In markets, agents are self-interested (determine and follow their own goals) and 
value their freedom of association and own judgement above security and trust issues. 
Network organisations are built around general patterns of interaction or contracts. 
Relationships are dependent on clear communication patterns and social norms. 
Agents in a network society are still self interested but are willing to trade some of 
their freedom to obtain secure relations and trust. Finally, in a hierarchy interaction 
lines are well defined and the facilitation level assumes the function of global control 
of the society and co-ordination of interaction with the outside world. Table 1 gives 
an overview of the characteristics of different agent societies. 

The characteristics and requisites for each role determine the required capabilities 
of agents fulfilling the role in terms of its communicative and reasoning capabilities. 
For example, agents acting in a network are expected to negotiate their interaction 
procedures and are motivated by mutual interest. This means such agents will be 
required to be able to reason about other agents and need to possess ‘heavy’ 
negotiation algorithms. On the other hand, members of a hierarchical society follow 
pre-determined communication lines and have limited need for negotiation, thus 




Modelling Agent Societies; Co-ordination Frameworks and Institutions 



197 



agents fulfilling hierarchical roles can be much simpler in terms of communication 
and negotiation capabilities. 



Table 1. Characteristics of agent societies 





Market 


Network 


Hierarchy 


Type of society 


Open 


Trust 


Closed 


Members ‘values’ 


Self interest 


Mutual interest 


Dependency 


Society purpose 


Exchange 


Collaboration 


Production 


Interaction 


Interaction is based on 
standards; 
communication 
concerns exchange only 


Both interaction and 
exchange procedures 
can be negotiated 


Specified on 
design 



In order to be able to assign roles to agents, the society model must be able to make 
some assumptions on the capabilities of the agent. However, since open societies are 
based on the principle that participating agents are developed independently from the 
society, it is not possible to make too many assumptions on the specific architecture 
of agents. We use a generic agent model as a basis for our assumption on agents. This 
model is based on the work of [4]. This model makes no demands on the way internal 
agent components are designed, but assumes that agents will in some way be able to 
use the indicated capabilities. Agent engineers are free to design their agents’ internal 
components in different ways, and even do without some of the components. The 
description of roles in the society model refers to this agent model and describes the 
society expectations on the capabilities of agents that perform the role. 

We have developed a methodology (described in more detail in [1 1]) for the design 
of agent societies based on co-ordination structures. The aim of the methodology is to 
provide generic facilitation and interaction frameworks for agent societies that 
implement the functionality derived from the co-ordination model applicable to the 
problem domain. We can compare this process to the design a generic enterprise 
model including roles as accountants, secretaries and managers, as well as their job 
descriptions and relationships, and then extending it with the functions necessary to 
achieve the objectives of the given enterprise. These are, for example, designers and 
carpenters if the firm is going to manufacture chairs, and programmers and system 
analysts if the enterprise is a software house. 



4.1. Roles in the Market Co-ordination Model 



The main goal of a market is to facilitate exchange between agents. In a market 
model, agents are self-interested (determine and follow their own goals), represent (or 
provide) services and/or competencies and compete to perform tasks leading to the 
satisfaction of their own individual objectives. Agents are usually assumed to be 
heterogeneous and the negotiation rules are fixed (for example Contact Net or Dutch 
auction). Interaction in markets occurs through communication and negotiation with 
the market rules. 

Co-ordination through a market mechanism is particularly well suitable for 
situations in which resources can be described easily or are commoditised, there are 
several agents offering the same (type) of resources and several agents that need 
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them. Besides obvious e-commerce applications, the market architecture is also a 
good choice to model product or service allocation problems. Being self-interested, 
agents will first try to solve their own local problem, and then agents can potentially 
negotiate with other agents to exchange services or goods in shortage or in excess. 
Agent societies based on the market model have been used to represent virtual 
enterprises [19]. Facilitation roles necessary for the organisation of a market model 
are: 

Identification: has the task of registering members of the society. Can also 
receive requests from matchmakers or bankers 

Matchmaker: keeps track of agents in the system, their needs and possibilities 
and mediates in the matching of demand and supply of goods or services. 
Depending on the domain, the task of a matchmaker can be a simple unification 
algorithm or a complex fuzzy matching algorithm. Matchmakers must be able to 
receive requests from agents and contact possible partners. Depending on the 
domain, this capabilities can be just a simple message request( buyer?, product, 
price) or announce( seller, product, price) or it can involve more general 
communication determining the requirements on both products and potential 
partner. Furthermore, matchmakers need to have knowledge of current sellers and 
requests in the society. I.e. they need to maintain a kind of yellow guide. 

Banking: define ways to value the goods to be exchanged and determine profit 
and fairness of exchanges. A banking service builds confidence for customers as 
well as offers guarantees to the members of the society. Bankers must be able to 
receive requests from agents wishing to register themselves (open an account) or 
wishing to get information on other agents, and need to keep knowledge on their 
clients 



4.2. Roles in the Hierarchy Co-ordination Model 

Hierarchies co-ordinate the flow of resources or information by controlling and 
directing it at a central point in the managerial hierarchy. Interaction and design are 
determined by managerial decisions and achievement of global goals is most critical. 
Demand parties do not select a supplier from a group of potential suppliers: they 
simply work with a predetermined one. In hierarchical systems, each agent controls a 
statically defined sub-hierarchy (possibly empty), in many cases an administrative 
domain of some kind. Environments where the workflow is fixed and cases are 
repetitive, such as in automated manufacturing are well suited to the hierarchical 
model. In such systems, reliable control of resources and information flow requires 
central entities that manage local resources and data but also need quick access to 
global ones. Hierarchical models of agents have been used to model information 
agents ([6]) and the management of communication networks ([14]). 

In a hierarchical co-ordination model, agents at facilitation level are mainly 
dedicated to the overall control and optimisation of the system activities. Sometimes, 
these facilitation activities are concentrated in one agent, typically the ‘root’ agent of 
the hierarchy. Facilitation roles necessary to the organisation of a hierarchy are: 

Controllers: monitor and orient the overall performance of the system or of a 
part of the system. Autonomous agents have local perspective and their actions 
are determined by its local state. Therefore, in a hierarchical co-ordination model 




Modelling Agent Societies; Co-ordination Frameworks and Institutions 



199 



it is necessary to have an agent whose role is to control the overall performance 
of the system. 

Interface agents: are responsible for the communication between the system and 
the ‘outside world’. In this architecture communication lines between agents are 
predefined. Furthermore, agents are usually not free to enter or leave the system. 
Therefore communication with the outside world must be regulated at the 
facilitation level. 



4.3. Roles in the Network Co-ordination Model 

Networks are coalitions of self-interested agents that agree to collaborate to achieve a 
mutual goal. Agents in a network society are self-interested but are willing to trade 
some of their freedom to obtain secure relations and trust. Instead of a direct exchange 
as in markets, agents in a network model are willing to trade their services in 
exchange for later or soft rewards (such as a increase of prestige). Network co- 
ordination models are built around general patterns of interaction or contracts. 
Relationships are dependent on clear communication patterns and social norms. Co- 
ordination is achieved by mutual interest, possibly using trusted third parties, and 
according to well-defined rules and sanctions. These coalitions have been studied in 
the area of game theory and Distributed Artificial Intelligence (DAI) [20]. Dellarocas 
introduces the concept of Contractual Agent Societies (CAS) as a model for 
developing agent societies [7]. Network co-ordination models provide an explicit 
shared context, describing rules and social norms for interaction and collaboration. 
The society is responsible to make its rules and norms known to potential members. 
Agents in a network society enter a social contract with the society in which they 
commit themselves to act within and according to the norms and rules of the society. 

At the facilitation level of a network, agents monitor, register and help others form 
contracts, introduce (teach) new agents to the rules of the market and keep track of the 
reputation of agents. Furthermore, they keep and enforce the ‘norms’ of the agent 
community and ensure interaction. Roles at facilitation level in networks are: 

Matchmaker: keeps track of agents in the system, their needs and possibilities 
and mediates in the matching of demand and supply of goods or services. In the 
network co-ordination domain, the matching of supply and demand is usually 
more complex than in markets, because long-term interests have to be taken into 
account. Therefore, matchmakers will need to use, for instance, fuzzy matching 
algorithms, or multi-attribute matching to be able to perform their tasks. As in 
markets, matchmakers must be able to receive requests from agents and contact 
possible partners and need to keep knowledge of current offers and requests in 
the society. 

Gatekeeper: is responsible for accepting and introducing new agents to the 
market. Agents entering the marketplace must be informed about the possibilities 
and capabilities of the market. Gatekeepers negotiate the terms of a social 
contract between the applicant and the members of the market. 

Notary: register collaboration contracts between agents. 

Monitoring agents: are trusted third parties that keep track of the execution of 
collaboration contracts between agents. 
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5. Applications 

The framework described in this paper can be applied to very distinct problem 
domains, because it concentrates on the organisational elements of the agent societies. 
At Achmea, a financial and insurance holding organisation operating mainly in the 
Netherlands, the ideas described in this paper are being applied to the development of 
a system for support of knowledge sharing (K-Exchange). This project is further 
described below. Other plans for application this framework include the development 
of a mediation system in the area of secondary healthcare co-ordination (CareMarket). 
Although both projects are still in a initial phase and no results are as yet available, 
the models developed illustrate the possibilities of the different co-ordination 
frameworks and the use of institutions 

CareMarket 

The aim of CareMarket, a community care project is to provide Achmea clients 
with extra (unskilled) care services, which are not covered by professional 
organisations, or for which there are long waiting lists. The project is inspired by the 
LETS concept and based on non-monetary trading concepts. Matching of supply and 
demand in this kind of situations is not trivial. The fulfilment of a demand usually 
requires the co-ordination of several suppliers, suppliers are voluntaries and usually of 
a very limited and constrained range of services. Eurthermore, it is desirable to keep a 
continuity of relationships between suppliers and clients (people tend to develop 
friendship relations with their care tenders / care takers and do not really appreciate to 
see a new face every day). This pilot is in a very initial phase of development but 
there is already a clear realisation that the institutional framework described in this 
paper will be directly applicable to the development of an agent-based simulation 
prototype. The evaluation of the system through the simulated institution populated 
with intelligent agents, representing suppliers and clients, will provide insights and 
support to the eventual deployment of a real community pilot. 

Knowledge Exchange Network 

The objective of the Knowledge Exchange Network project is to support non-life 
insurance experts to exchange knowledge with each other, in a way that preserves the 
knowledge, rewards the knowledge owner and reaches the knowledge seeker in a just- 
in-time, just-enough basis. Current users of the pilot project are project managers, 
product developers, actuaries in the Non-life group of Achmea but in the future it will 
be extended to other people (e.g. call centre employees) and groups. Members of the 
network have lots of knowledge, which is greatly valuable and useful to each other. 
So, one of the main tasks of the Knowledge Exchange Network is to support and 
encourage their contacts. Experience shows that any technological support for 
knowledge exchange greatly improves if users feel they know and can trust each 
other. Therefore, the Knowledge Management activities at the Non-life group consist 
of two parts: face-to-face workshops with the aim of getting people to know each 
other, share their experiences and extend their knowledge and a virtual network, 
aiming both at a knowledge repository and at the support of communication and 
collaboration. 

Eor the share support module, an agent society is being developed using the 
framework based design method described in this paper. In this society, both 
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knowledge seekers as knowledge owners want to be able to decide on trade partners 
and conditions. Sharing is not centrally controlled but greatly encouraged by the 
management. The best-suited partner, according to each participant’s own conditions 
and judgement, will get the ‘job’. However, factors such as privacy, secrecy and 
competitiveness between brands and departments may influence the channels and 
possibilities of sharing and must thus be considered. 

The requirements for the system identify a distributed system where different 
actors, acting autonomously on behalf of a user, and each pursuing its own goals, 
need to interact in order to achieve their goals. Communication and negotiation are 
paramount. Furthermore, the number and behaviour of participants cannot be fixed a 
priori and the system can be expected to expand and change during operation, both in 
number of participants as in amount and kind of knowledge shared. These 
characteristics indicate a situation for which the agent paradigm is well suited and 
therefore the methodology we propose can be applied. 

Considering the requirements, the network model is the most appropriate for this 
situation. The aim is to design an exchange society restricted to selected participants 
with the global goal of supporting collaboration and synergy, and in this way meet the 
organisation requirements. Participants are aware of and collaborative with this 
requirement but also have their own objectives and constraints. Participants wish to be 
free to determine their own exchange rules and to be assured that there is control over 
who are the other participants in the environment. 

Due to space limitations, we cannot describe the complete system in this paper. In 
the following we will describe some of the roles and interactions. Having decided for 
a network structure, the roles of matchmaker, notary, monitor, and gatekeeper follow 
naturally from the application of the framework. From the domain requirements the 
roles of knowledge owner and knowledge seeker can be deduced. The ‘goods’ to be 
exchanged are the contents of the knowledge repository, that is, (XML) documents 
representing knowledge about reports, people, applications, web sites, projects, 
questions, etc.* Figure 1 shows a fragment of the architecture of the society, indicating 
roles and possible interaction procedures. These procedures are also determined by 
the model chosen (network) and are informally described. 

The institution underlying the society also imposes mechanisms for collaboration 
and certification. For instance, in the knowledge network a special kind of knowledge 
owner is responsible for the gathering and dissemination of information about a 
known, fixed list of subjects to knowledge seekers that subscribed to it. The 
institution must enforce the norm that such agents are required to provide all the 
information they are aware of. This determines a task for the monitors tracing this 
type of contracts of checking if information in all subjects in the list is indeed 
provided. 



' This type of goods demands a complex matching mechanism, since matches are not at 
keyword level but require knowledge about relationships, processes etc. This imposes 
constraints to the task and communicative components of agents. This will not be discussed 
here. 
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Facilitation layer for Network Society 



memb e r s h i p_ap p 1 i c a t i on ( X , gatekeeper) : 

This is a negotiation between any agent and the gatekeeper of the society resulting in either 
an acceptance, that is X will become member of the society, or a rejection. 

The role the agent will play is also determined in this scene. 

r e g i s t e r ( M , matchmaker) : 

Knowledge owners or seekers can register their requests with the matchmaker, 
who will use this information in future matches 

r e qu e s t _p a r t n e r ( M , matchmaker) : 

Knowledge owners or seekers request possible partners for an exchange. 

Results in a possibly empty list of potential partners. 

ne g o t i a t e_p a r t n e r s h i p ( M , N) : 

Owners and seekers check the viability of an exchange and determine conditions 
ma k e_c on t r a c t { M , N, notary) : 

When an agreement is reached, partners register their commitments with the notary, 
app o i n t ( no t a r y , monitor) : 

The notary appoints a monitor for a contract. It delegates agreed tasks to the monitor. 

The monitor will keep track of contract status and will act when an undesired state is reached. 

app 1 y_s a nc t i o n ( mon i t o r , M ) : 

when a breech of contract occurs the monitor will contact the faulty party and apply the 
sanctions agreed upon (either described in the contract or standard in the institution). 



Fig. 1. Fragment of the Knowledge Exchange Network architecture 



6. Conclusions and Future Work 

We have presented a framework for the design of agent societies based on the co- 
ordination structure of the domain that uses institutions to specify and enforce social 
norms and conventions. The framework takes the organisational perspective as 
starting point. We believe that one contribution of our research is that it describes the 
implications of the co-ordination model of the organisation for the architecture and 
design method of the agent society being developed. Although there are several agent- 
based software engineering methodologies (see, [8, 3, 16, 21]) these are often either 
too specific or too formal and not easily used and accepted. Our approach is to 
provide a generic frame that directly relates to the organisational perception of a 
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problem. If needed, existing methodologies can be used for the development, 
modelling and formalisation of each step. We believe that our approach will 
contribute to the acceptance of multi-agent technology by organisations. 

We also exposed the need for institutions in systems of autonomous agents that act 
according to their own goals and capabilities. Institutions enforce the global 
behaviour of the society and assure that the global goals of the society are met. 
Institutions play an important role to specify and manage the conventions of the agent 
society. One of the most important aspects is that they can make organisational goals 
and norms explicit and warrant their fulfilment by providing explicit facilitation roles 
and controlled interaction protocols. Different co-ordination models have different 
needs in terms of how institutions are specified. Feedback from the applications 
currently under development at Achmea will be used to improve the design 
methodology and the co-ordination frameworks used. 

Important work that is left for the future is the formal description of both the co- 
ordination framework as well as the institutions. This will provide means for verifying 
properties of the institution. It will also enable agents that consider joining the society 
whether they are able and willing to conform to the specified conventions and 
interaction mechanisms. 
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Abstract. Decentralised co-operative multi-agent systems are compu- 
tational systems where conflicts are frequent due to the nature of the 
represented knowledge. Negotiation methodologies, in this case argumen- 
tation based negotiation methodologies, were developed and applied to 
solve unforeseeable and, therefore, unavoidable conflicts. 

The supporting computational model is a distributed belief revision sys- 
tem where argumentation plays the decisive role of revision. The dis- 
tributed belief revision system detects, isolates and solves, whenever pos- 
sible, the identified conflicts. The detection and isolation of the conflicts 
is automatically performed by the distributed consistency mechanism 
and the resolution of the conflict, or belief revision, is achieved via argu- 
mentation. 

We propose and describe two argumentation protocols intended to solve 
different types of identified information conflicts: context dependent and 
context independent conflicts. While the protocol for context depen- 
dent conflicts generates new consensual alternatives, the latter chooses to 
adopt the soundest, strongest argument presented. The paper shows the 
suitability of using argumentation as a distributed decentralised belief 
revision protocol to solve unavoidable conflicts. 



1 Introduction 

Co-operative, decentralised multi-agent systems are computational systems com- 
posed of autonomous agents capable of solving distributed complex problems, 
which they are unable to do individually. Each agent plays the role of an expert 
in a specific system knowledge sub-domain and contributes within its sphere of 
competence for the achievement of the overall goal of the system. Since these sys- 
tems have no central control, the agents are fully responsible for the information 
they share or the actions they take and, therefore, have to rely on appropriate 
methodologies for co-operation. 
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Co-operative interactions generate elaborated inter-agent knowledge depen- 
dency networks which, in our case, are based on the assumption that trustful, 
benevolent agents convey only correct information and results. However, this as- 
sumption does not guarantee the overall consistency of the system’s knowledge 
since agents are resource bounded entities that have a partial understanding of 
the global problem and suffer from incomplete knowledge. Consequently, con- 
flicts regarding the beliefs of the different agents are frequently declared. These 
disparate perspectives can either be reconcilable or incompatible, falling, respec- 
tively, into the categories of negative and positive conflicts [14]. In particular, 
this paper is concerned with solving negative conflicts that occur when agents 
hold contradictory beliefs. 

Negotiation based conflict resolution protocols, particularly argumentation 
protocols, are well suited for autonomous co-operative agents. In argumentation 
based negotiation, or simply argumentation, the agents generate and exchange 
arguments to support their findings or conclusions, evaluate them and agree 
to accept the soundest arguments presented or the first consensual alternative 
proposed. 

In this paper, argumentation is regarded a distributed revision protocol and, 
thus, is a functionality of a distributed decentralised belief revision system. 
The autonomous agents are modelled as individual reason maintenance systems 
where beliefs are always represented with their supporting justifications. A be- 
lief’s justification constitutes a full argument in favour of the belief and, in our 
case, is composed by the set of elementary beliefs (called foundations) that sup- 
port the belief. So, when an agent shares a belief with others, it sends the belief 
together with its supporting arguments. The receiver agents, with this additional 
information, verify if the arrived belief is consistent with the already represented 
beliefs and, in case of conflict, immediately identify and isolate the conflicting 
set of beliefs in order to preserve consistency. However, consistency maintenance 
alone does not solve conflicts. In order to (try to) solve the detected conflicts 
negotiation-based strategies (in our case argumentation) need to be applied. 

We propose and describe two argumentation protocols intended to solve dif- 
ferent types of identified information conflicts: context dependent and context in- 
dependent conflicts. While the protocol for context dependent conflicts generates 
new consensual alternatives, the latter chooses to adopt the soundest, strongest 
argument presented. The strategies implemented are the decision functions used 
to choose the soundest argument or to provide new conflict-free alternatives. 

The objective of this paper is to show the appropriateness of argumentation 
as a distributed belief revision protocol for decentralised co-operative conflict 
solving by describing how this extension was implemented and by presenting the 
developed argumentation strategies. 

After this introductory section we briefly present our motivation and ap- 
proach and describe the argumentation based conflict solving protocol developed. 
We start by presenting the protocol for solving Context Independent Conflicts 
and then follow with the protocol for solving the Context Dependent Conflicts. 
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In the fifth section we discuss our work by comparing it with related work, and, 
in the last section we draw our conclusions. 



2 Motivation and Approach 

The purpose of the majority of the negotiation protocols found in the literature 
(for example [10], [15], [17] or [16]) is to avoid the declaration of potential future 
conflicts. A typical case is when two agents detect that their future actions, goals 
or plans are conflicting and use negotiation to find conflict-free alternatives. The 
work presented in this paper, rather than avoiding the declaration of conflicts, is 
focused on the actual resolution of conflicts that cannot be predicted beforehand 
and, therefore, must be identified, isolated and, whenever possible, solved. 

The main motivation of this work came from the proposed problem domain: 
the determination of appropriate locations for new project developments. This 
real world domain consists of multiple public evaluation agencies that typically 
reach verdicts independently, i.e., without realising that they share knowledge 
(legislation) and that their judgements and recommendations are interdepen- 
dent. This type of procedure results in long and costly (re-) submission, evalu- 
ation, alteration cycles both for the public agencies as well as for project de- 
velopers. The goal was to design an intelligent decision-making tool capable of 
finding adequate project locations while correcting this undesirable real world 
behaviour. 

The approach followed was modular and incremental. First, the identified fea- 
tures of the problem domain suggested the design of a co-operative decentralised 
multi-agent system where distributed volatile information could be adequately 
represented, used and maintained. The individual agents were remotely inspired 
by the ARCHON architecture [18], and are structured in two main layers: the 
intelligent system layer and the cooperation layer. The intelligent system layer is 
modelled as a reason maintenance system which includes an Assumption based 
Truth Maintenance Systems (ATMS) [2] . A reason maintenance system can rep- 
resent and reason with dynamic data by using beliefs since beliefs, unlike facts, 
are represented with their supporting justifications. The agents knowledge bases 
(composed of beliefs and facts) are updated according to the most recent find- 
ings and their consistency is preserved by the automatic detection and isolation 
of conflicts. The cooperation layer includes a self model, an acquaintances model 
and a communication module. The description of the architectural and functional 
details of the system can be found in [11]. Another possible agent architecture 
could be a BDI-like architecture [7]. The BDI agents are considered rational 
agents that have certain mental attitudes of Belief, Desire and Intention (BDI), 
representing, respectively, the information, the motivational and the deliberative 
states of the agent. 

On top of these functionalities, we added an argumentation extension respon- 
sible for solving the identified information conflicts. The autonomous agents 
engage in co-operative interactions by sharing beliefs and their justifications, 
maintain and update their internal beliefs, and are responsible for communi- 
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eating relevant changes (revisions included) to their counterparts. The resulting 
distributed belief revision system is composed of an initial consistency preser- 
vation stage followed by a final argumentation-based conflict resolution phase. 
This view of belief revision as a truth maintenance process followed by a selection 
mechanism of one (or more) preferred solutions was also proposed by [3] . 

We developed two distributed belief revision protocols to solve: 

— Context Independent Conflicts, which occur when a distributed belief is, 
simultaneously, believed by some agents and disbelieved by others; 

— Context Dependent Conflicts, which occur when the agents detect inconsis- 
tent distinct beliefs. 

In the first case, the goal is to choose the most appropriate belief status to adopt, 
and, in the latter case, to find consensual alternatives to support the affected 
beliefs. The proposed conflict resolution protocols, although distinct from the 
typical negotiation based conflict resolution protocols that perform a distributed 
search trough the space of possible solutions [8], have identical goals: both try 
to reach acceptable agreements to all the parties involved by: 

— choosing between conflicting views, by comparing the reasons behind each 
stance and choosing the strongest view; 

— building a new consensual view, by searching for fully acceptable alternative 
foundations for the disputed information. 

The problem domain was then modelled, using this architecture, as a co- 
operative multi-agent composed of autonomous agents with belief revision ca- 
pabilities. The resulting distributed belief revision system contains two types 
of functional agents: decision-making agents and information providing agents. 
While the information providing agents classify and provide diverse thematic 
information for a given geographic region (usually include geographical informa- 
tion systems), the role of the decision-making agents is to find locations that 
satisfy the requirements of the submitted projects and also comply with the 
legislation applicable to the geographical region under consideration. 

The distributed belief revision activity plays a key role in this platform since 
it allows the autonomous agents to take into account the existing knowledge 
dependencies, to detect any information conflicts and, whenever possible, solve 
the distributed conflicts. Argumentation is particularly well suited for this do- 
main because the system needs to produce a final verdict which can be justified 
and that results from a global consensus. Suppose, for example, that the system 
is trying to find a location for a new farm development and a conflict between 
Agenti and Agent2 occurs: Agenti believes that ’’sandstone is appropriate for 
agriculture” and Agent2 disbelieves it. This conflict, which is domain indepen- 
dent, is addressed via argumentation: the agents argue in favour of their own 
perspective by calculating and exchanging the respective credibility and agree 
to adopt the most credible perspective. Consider that another conflict is then 
detected between Agenti and Agent2- Agenti believes that ’’irrigated land is 
appropriate for agriculture” and Agent2 believes that ’’water resources must be 
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preserved”. This conflict, which is domain dependent, is also addressed via ar- 
gumentation: the agents try to And new consensual alternatives regarding the 
concepts in conflict. Agent 2 informs Agenti that it has no alternatives. Agenti 
finds as a candidate that ’’range land is appropriate for agriculture” . This alter- 
native is evaluated and since it is consensual it is assumed by Agenti. 

The resulting system, named Distributed Project Location Multi-Agent Test 
bed (DIPLOMAT), is described in [12]. The DIPLoMAT system determines ap- 
propriate locations for the specified projects and, simultaneously, allows valuable 
what/if analysis. This ability, which is supported by distributed belief revision, 
provides immediate answers to questions like What if a specific legislation rule 
is altered?. What if a particular project requirement is changed?, and so forth. 
Before continuing with our paper we wish to present some definitions: 

1. Beliefs are first order logic sentences; 

2. Beliefs result from perception, communication, assumption or inference; 

3. Foundational beliefs or foundations are self-supported elementary beliefs 
which follow directly from perception or assumption; 

4. A belief’s justification is a minimal set of foundations from which it depends. 

5. A belief </> of agent Ag (also referred as agent Ag view of (f) is represented 
by a tuple (also called an ATMS node) < (j)Ag,S{4>Ag),iF{4’Ag),Ag >, where 
(j)Ag identifies the belief, S{4>Ag) specifies the belief’s endorsement (observed, 
assumed, communicated or inferred), T{<j)Ag) contains the belief’s supporting 
justifications, and Ag identifies the belief’s source agent. The belief status 
is established according to the T{(f)Ag)- (i) Believed, if T{(j)Ag) ^ { 0 }; (h) 
Disbelieved, if T{4>Ag) = { 0 }; 

6. A distributed belief (p is composed by all existing beliefs regarding p, i.e., 

includes every agent view of 4>. For example, < 4>i,S{(l)i),iF{(l)i), Agi >, . . ., 
^ Pn: ^ ^ ^ Ag^ 

3 Context Independent Conflicts 

The Context Independent Conflicts result from the assignment, by different 
agents, of contradictory belief statuses to the same belief. Every agent main- 
tains not only its individual beliefs but also participates on the maintenance of 
the distributed beliefs that include its own views. While the responsibility for 
individual beliefs relies on each source agent, the responsibility for distributed 
beliefs is shared by the group of agents involved. Three elementary processing 
criteria for the accommodation of distributed beliefs were implemented: 

— The CONsensus (CON) criterion - The distributed belief will be Believed, 
if all the perspectives of the different agents involved are believed, or Disbe- 
lieved, otherwise; 

— The MAJority (MAJ) criterion - The distributed belief will be Believed, as 
long as the majority of the perspectives of the different agents involved is 
believed, and Disbelieved, otherwise; 
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— The At Least One (ALO) criterion - The distributed belief will be Believed 
as long as at least one of the perspectives of the different agents involved is 
believed, and Disbelieved, otherwise. 

By default, i.e., when the distributed belief is consensual, the CON criterion is 
applied, otherwise the distributed belief results from the resolution of the de- 
tected Context Independent Conflict. The methodology we are proposing for 
solving the Context Independent Conflicts is organized in two steps: first, it 
establishes the desired outcome of the social conflict episode, and then, it ap- 
plies the processing criterion that solves the episode accordingly. During the 
first stage, the conflict episode is analysed to establish the most reasonable out- 
come. The necessary knowledge is extracted from data dependent features like 
the agents’ reliability [4] and the endorsements of the beliefs [5], allowing the 
selection, at runtime, of the most adequate processing criterion. In particular, 
in our work the reliability of an agent is domain dependent - an agent can be 
more reliable in some domains than in others. The dynamic selection of the pro- 
cessing criterion is based on assessment of the credibility values associated with 
each belief status. Two credibility assessment procedures were designed: 

The Foundations Origin based Procedure (FOR) - where the credibility of the 
conflicting perspectives is determined based on the strength of the founda- 
tions endorsements (observed foundations are stronger than assumed foun- 
dations) and on the domain reliability of the source agents; and 
The Reliability based Procedure (REL) - where the credibility of the con- 
flicting views is based on the reliability of the foundations source agents. 

The methodology for solving Context Independent Conflicts starts by applying 
the FOR procedure. If the FOR procedure is able to determine the most credible 
belief status, then the selected processing criterion is applied and the episode is 
solved. However, if the result of the application of the FOR procedure is a draw 
between the conflicting perspectives, the Context Independent conflict solving 
methodology proceeds with the application of the REL procedure. If the REL 
procedure is able to establish the most credible belief status, then the selected 
processing criterion is applied and the episode is solved. 

The sequential application of these procedures is ordered by the amount of 
knowledge used to establish the resulting belief status. It starts with the FOR 
procedure which calculates the credibility of the conflicting perspectives based 
on the strength of the endorsements and on the domain reliability of the sources 
(agents) of the foundations. It follows with the REL procedure which computes 
the credibility of the conflicting beliefs solely based on the domain reliability of 
the sources of the foundations. We will now describe in detail the procedures for 
solving Context Independent Conflicts mentioned above. 

3.1 The Foundations ORigin Based Procedure (FOR) 

Generic beliefs are supported by sets of foundational beliefs which follow directly 
from observation, assumption or external communication. Since communicated 
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beliefs also resulted from some process of observation, assumption or communi- 
cation of other agents, foundations are, ultimately, composed of just observed or 
assumed beliefs. Galliers [5] refers to a belief’s process of origin as the belief’s 
endorsement . 

When a Context Independent Conflict involving a distributed belief (p occurs 
the FOR procedure is invoked and the credibility values for each one of the 
conflicting views regarding p is computed according to the following formulas: 

C{P,Bel) = Y.\g=iC{PAg,Bel)/N 

C{p,Dis) = Y!Ag=iC{(t>Ag,Dis)/N 

where N is the number of agents involved in the conflict. The values of the 
credibility attached to each agent view are equal to the average of the credi- 
bility values of their respective sets of foundations. The credibility of a generic 
foundation of ^ - for example, < £(a^g), iF(a^g), > which belongs to 

domain D - depends on the reliability of the foundation’s source agent in the 
specified domain {TZ{D, Ag)), on the foundation’s endorsement (S{aAg)) and on 
its support set (!F{aAg))' 

C{aAg, Bel) = 1 X TZ{D,Ag), if B{aAg) ^ {0} and S{aAg) = obs; 

c\aAg, Bel) = 1/2 X TZ{D, Ag), if iF(aAg) ^ {0} and SlpiAg) = ass; 

C{aAg,Bel) = 0 if B{aAg) = {0}; 

C{aAg, Dis) = 1 X TZ{D,Ag), if T{aAg) = {0} and S{aAg) = o&s; 

C{aAg, Dis) = 1/2 X TZ{D,Ag), if B{aAg) = {0} and £{aAg) = ass; 

C{aAg, Dis) = 0 if B{aAg) {0}; 

Within this procedure, the credibility granted, a priori, to observed foun- 
dations and assumed foundations was, respectively, 1 and 1/2. The credibility 
of each foundation is also affected by the reliability of the origin agent (source 
agent) for the domain under consideration. As a result, each perspective is as- 
signed a credibility value equal to the average of the credibility values of its 
support foundations. The credibility of any perspective has, then, a value be- 
tween 0 and 1. A credibility value of 1 means that perspective is 100% credible 
(it solely depends on observed foundations generated by agents fully reliable 
on the data domain), whereas a credibility value of 0 means that no credibility 
whatsoever is associated with the perspective. Semantically, the 1 and 1/2 val- 
ues granted to observed and assumed foundations have the following meaning: 
evidences corroborated by perception are 100% credible, whereas assumptions 
have a 50% chance of being confirmed or contradicted through perception. 

The FOR procedure calculates the credibility attached to the each one of the 
conflicting belief statuses {Believed and Disbelieved), and chooses the multiple 
perspective processing criterion whose outcome results in most credible belief 
status. If the most credible belief status is: (i) Disbelieved, then the CON criterion 
is applied to the episode of the conflict; (ii) Believed, then, if the majority of the 
perspectives are in favour of believing in the belief the MAJ criterion is applied; 
else the ALO criterion is applied to the episode of the conflict. The agents’ 
reliability on the domain under consideration is affected by the outcome of the 
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Context Independent Conflict episodes processed so far. An episode winning 
agent increases its reliability in the specified domain, while an episode loosing 
agent decreases its reliability in the specified domain. At launch time, the agents 
are assigned a reliability value of 1 to every knowledge domain, which, during 
runtime, may vary between 0 and 1. If the agent’s view won the conflict episode 
of domain D then 

T^{D,Ag) = TZ{D,Ag) x (I + N^j/N) x fnorm, where N^, N and f^orm 
represent, respectively, the number of agents who won the episode, the total 
number of agents involved, and a normalization factor needed to keep the 
resulting values within the interval [0,1]; 

If agent Ag view lost a Context Independent Conflict episode of domain D then 

T^{D,Ag) = n{D,Ag) x (1 - Ni/N) x fr,orm , where Ni, N and fnorm 
represent, respectively, the number of agents who lost the episode, the total 
number of agents involved in the conflict episode and a normalization factor 
needed to keep the resulting values within the interval [0,1]. 

A domain reliability value of 1 means that the agent has been fully reliable, and 
a value near 0 means that the agent has been less than reliable. 



Example Suppose a multi-agent system composed of three agents, Agenti, 
Agent 2 and Agents, with the following knowledge bases: 

Agenti has observed a{a), assumed /3(a) and has two knowledge production 
rules^. Ti l : a{X) A /3(A) ^ ^(A) and ri _2 : <3(A) A (f>{X) '4’{X): 

< ai{a),obs,{{ai{a)}}, Agenti >', 

< /3i(a), ass, {{/3i(a)}}, Agenti >; 

< ri^i,obs,{{ri^i}}, Agenti >; 

< ?"i,2,o&s, {{ri,2}}, Agenti >; 

Agent 2 has observed a(a), assumed <3(a) and has one knowledge production 
rule, r 2 ,i : o;(A) A <3(A) ^ 4>{X): 

< a2{a),obs,{{a2{a)}},Agent2 >; 

< (52(a), ass, {{i52(a)}}, Agent 2 >; 

< r2s,obs,{{r2,i}},Agent2 >; 

Agents has observed 

< f>z{a),obs,{{(j)s{a)}}, Agents >• 

Agenti is interested in receiving information regarding a(A), /3(A), (5(A), (/>(A) 
and ip{X), Agent 2 is interested in a(A), (5(A) and (f{X), and Agents is interested 
in any information on 4>{X). The agents after sharing their results (according to 
their expressed interests) end up with the following new beliefs: 

Agenti has received 02 ( 0 ), ^ 2 ( 0 ), ?' 2 ,i, <(' 3 ( 0 ), and (/> 2 (a), and has inferred 
^i(a) and ipi{a): 

^ rules are also represented as beliefs which may be activated or inhibited (be- 
lieved / disbelieved) . 
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< a2{a),com,{{a2{a)}},Agent2 >; 

< 62(0), com, {{62(0)}}, Agent2 >; 

< r2s,com,{{r2,i}},Agent2 >; 

< h{a), com, {{(j>3{a)}}, Agents >', 

< (j)i{a),inf,{{ai{a),a2{a),P2{a),ri^i}},Agenti >; 

< (j)2{a), com, {{ai{a),a2{a), 62(a), r2,i}},Agent2 >; 

< ^i(a),inf, {{ai(a),a2(a),Pi(a),S2(a),ri^i,r2,i,(l>3(a),ri^2}}, Agenti>; 
Agent2 has received a\{a), / 3 i(a), (j)s(a), and ^i(a), and has inferred 
02(a): 

< ai{a),com,{{a\(a)}}, Agenti >; 

< j 3 i(a),com,{{( 6 i(a)}},Agenti >; 

< ri^i,com,{{ri^i\},Agenti >; 

< (!>z(a), com, {{(j>s(a)}}. Agents >', 

< (f)2(a),inf, {{ai(a),a2(a), 62(a), r2,i}},Agent2 >; 

< (j)i(a),com,{{ai(a),a2(a),Pi(a),ri^i}},Agenti >; 

Agents has received ai(a), a2(a), (ii(a), 62(a), ri,i, r2,i, 0 i(a), 02(a) ri_i, 
01 (a), and has inferred 02(a): 

< ai(a),com,{{ai(a)}},Agenti >; 

< a2(a),com,{{a2(a)}},Agent2 >; 

< Pi(a),com,{{/ 3 i(a)}},Agenti >; 

< 62(a), com, {{62(a)}}, Agent2 >', 

< ri^i,com,{{ri^i}},Agenti >; 

< r2n,com,{{r2,i}},Agent2 >; 

< (j)i(a),com,{{ai(a),a2(a),/3i(a),ri^i}},Agenti >; 

< 6^2(a), com, {{ai(a),a2(a), 62(a), r2^i}},Agent2 >; 

At some point, Agents realizes, via observation, that 0(a) is no longer believed, 
i.e., < cj)s(a),obs,{^}, Agents >■ A first episode of a Context Independent Con- 
flict regarding 0(a) is declared: 0i(a) and 02(a) are believed while 03 (a) is disbe- 
lieved. In the case of our conflict, the credibility values assigned to the believed 
status is obtained through the following expressions: 

C( 0 i(a), Bel) = (C(ai(a), Bel)+C(a2(a), Bel)+C(( 3 i(a) , Bel)+C(ri^i, Bel)) /A 
C(02(a), Bel) = (C(ai(a), Bel)+C(a2(a), Bel)+C(62(a), Bel)+C(r2,i, Bel)) /A 
C( 4 >s(a), Bel) =C( 4 >s(a),Bel) 

As the default reliability values assigned to every agent domain was 1 , the cred- 
ibility values associated to the conflicting perspectives are: 

C( 4 >i(a), Bel) = (1 + 1 + 1/2 + 1 ) /A and C((/i(a), Dis) = 0 

C(4>2(a),Bel) = (1 + 1 + 1/2 + 1 ) /A and C((f>2(a), Dis) = 0 

C( 4 >s(a), Bel) = 0 and C( 4 >s(a), Dis) = 1 

resulting in 



C((j)(a),Bel) = 7/12 



and 



C((f>(a), Dis) = 1 / 3 . 
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Since C{(j){a), Bel) > C{(j){a),Dis), the multi-agent system decides to believe in 
(p{a). The first episode of the conflict regarding cj){a) was successfully solved 
through the application of the FOR procedure, and the processing criterion that 
must be applied to generate the adequate social outcome of this conflict episode 
is the MAJ criterion. Finally, the conflict domain reliability values of the agents 
involved in the conflict episode are updated accordingly. So Agenti, Agent 2 and 
Agents updated credibility values for the domain under consideration {D) are: 

'R,{D, Agenti) = 1 x (1 -|- 2/3) /(I -I- 2/3), i.e., TZ{D, Agenti) = 1, 

7^(0, Agent 2 ) = 1 x (1 -|- 2/3) /(I -I- 2/3), i.e., TZ{D, Agent 2 ) = 1, and 

n{D, Agents) = 1 x (1 - l/3)/(l -h 2/3), i.e., n{D, Agents) = 6/15. 



3.2 The RELiability Based Procedure (REL) 

The REL procedure assigns each conflicting view a credibility value equal to 
the average of the reliability values of the source agents of its foundations. The 
credibility associated with the different perspectives that contribute to each be- 
lief status are added and the REL procedure chooses the processing criterion 
whose outcome results in adopting the most credible belief status. If the most 
credible belief status is: (i) Disbelieved, then the CON criterion is applied to the 
episode of the conflict; (ii) Believed, then, if the majority of the perspectives are 
in favour of believing in the belief the MAJ criterion is applied, else the ALO 
criterion is applied to the episode of the conflict. The reliability of the agents in 
the domain conflict is also affected by the outcome of the conflict episodes solved 
so far. Episode winning agents increase their reliability in the conflict domain, 
while episode loosing agents decrease their reliability in the conflict domain (see 
previous sub-section). 



4 Context Dependent Conflicts 

The detection of contradictory distinct beliefs naturally unchains the reason 
maintenance mechanism. However, in the case of the Context Dependent Con- 
flicts, consistency preservation is not immediately followed the application of 
conflict resolution strategies. The agents postpone resolution in an effort to pre- 
serve their current beliefs since the resolution of Context Dependent Conflicts 
implies abandoning the foundations of the conflicting beliefs and generating al- 
ternative consensual foundational beliefs. Resolution is delayed until the moment 
when the system concludes that it cannot fulfil its goals unless it tries to solve 
the latent conflicts. To solve this type of conflicts the agents need to know how 
to provide alternative support to the invalidated conclusions. This search for 
’’next best” solutions is a relaxation mechanism called Preference ORder based 
procedure (FOR). 

Each knowledge domain has lists of ordered candidate attribute values for 
some domain concepts. These lists contain sets of possible instances ordered by 
preference (the first element is the best candidate, the second element is the 
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second best, and so forth) for the attribute values of the specified concepts. The 
preference order values are affected by the proposing agents’ reliability in the 
domain under consideration. In the case of a foundational belief maintained by 
a single agent, the process is just a next best candidate retrieval operation. In 
the case of a distributed foundation, a consensual candidate has to be found. If 
the gathered proposals are: 

— Identical, then the new foundation has been found; 

— Different, then the agents that proposed higher preference order candidates 
generate new next best proposals, until they run out of candidates or a 
consensual candidate is found. 

The alternative foundations found through this strategy are then assumed by the 
system. The credibility measure of the resulting new foundations is a function of 
the lowest preference order candidate used and of the involved agents’ reliability. 
The agents’ reliability values in the domain under consideration are not affected 
by the resolution of Context Dependent Conflicts. 

5 Related Work 

Although the work presented in this paper was inspired by many previous con- 
tributions from research fields such as belief revision, argumentation based ne- 
gotiation and multi-agent systems, the approach followed is, as far as we know, 
novel. 

Authors like [16], [15] or [17] propose argumentation based negotiation pro- 
tocols for solving conflicts in multi-agent systems. However, they do not ad- 
dress the problem of how to solve already declared conflicts, but concentrate 
in avoiding the possible declaration of future conflicts. Several approaches to 
argumentation have been proposed: the definition of logics [10], detailed agent 
mental states formalisation or the construction and structure of valid arguments 
[17]. Nevertheless, we believe that argumentation as distributed belief revision 
has additional advantages over other argumentation protocols implementations. 
This claim is supported by the fact that in our computational model the sup- 
porting arguments are automatically shared during co-operation. For instance, 
when a context independent conflict occurs, since the arguments in favour of the 
conflicting beliefs have already been exchanged, the FOR and REL conflict reso- 
lution strategies are immediately applied speeding up the process. Furthermore, 
argumentation as distributed belief revision allows pro-active context indepen- 
dent conflict resolution, since when this type of conflict occurs the agents accept 
the social conflict outcome while retaining their own individual perspectives. As 
a result, the smallest change will trigger the re-evaluation of the conflict status. 

The implemented autonomous belief revision strategies were also build on 
the proposals of other authors. The idea of using endorsements for determining 
preferred revisions was first proposed by Cohen [1] and, later, by Galliers in 
[5]. However, while Galliers proposes several types of endorsements according to 
the process of origin of the foundational beliefs (communicated, given, assumed. 
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etc.), we only consider two kinds of endorsement: perception or assumption. We 
base this decision on the fact that beliefs result from some process of observation, 
assumption or external communication, and since communicated perspectives 
also resulted from some process of observation, assumption or communication 
of other agents, ultimately, the foundations set of any belief is solely made of 
observed and assumed beliefs. 

Similarly, Caspar [6] determines the preferred belief revisions according to a 
belief change function which is based on a set of basic principles and heuristic 
criteria (sincerity of the sending agent, confidence and credulity of the receiving 
agent) that allow the agents to establish preferences between contradictory be- 
liefs. Beliefs are associated to information topics and different credibility values 
are assigned to the agents in these topics. More recently, Dragoni and Giorgini 
[4] proposed the use of a belief function formalism to establish the credibility of 
the beliefs involved in conflicts according to the reliability of the source of belief 
and to the involved beliefs credibility. Each agent is assigned a global reliabil- 
ity value which is updated by the belief function formalism after each conflict. 
Our agents, like Caspar’s, have domain dependent reliability values because we 
believe that agents tend to be more reliable in some domains than others and 
that the assignment of a global agent reliability (like Dragoni and Giorgini do) 
would mask these different expertise levels. We also guarantee, unlike Dragoni 
and Giorgini, that communicated foundations are only affected by the reliability 
of the foundation source agent, and not by the reliability of agent that commu- 
nicated the foundation (which may be different). 

Finally, a quite interesting approach is presented by [9] where argumenta- 
tion is also regarded as an extension to an existing distributed computational 
model. The authors regard argumentation as constraint propagation and use, 
consequently, a distributed constraint satisfaction computational model. 



6 Conclusion 

This research was motivated by the need to design a co-operative decentralised 
multi-agent system where dynamic, incomplete data could be adequately repre- 
sented, used and maintained. The idea of using argumentation as a distributed 
belief revision protocol to solve information conflicts was a natural consequence 
of the adopted approach since the autonomous agents were modelled as individ- 
ual reason maintenance systems. An important advantage of this approach lies 
on the fact that arguments are automatically exchanged during co-operation and 
are already available when conflicts occur. As a result, the declaration of conflicts 
results in the immediate application of the implemented evaluation strategies. 

The belief revision strategies designed for the resolution of the identified types 
of negative conflicts are based on data dependent features or explicit knowledge 
rather than belief semantics. The data features used to solve conflicting beliefs 
have been previously proposed by other authors (source agent reliability by Drag- 
oni and Giorgini [4], endorsements by Galliers [5], and specification of preferences 
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between beliefs by Caspar [6]). However, we believe that we are combining them 
in a novel and more efficient manner. 

These argumentation protocols, which are implemented in the DIPLoMAT 
system, are under test and evaluation. They attempt to solve the detected con- 
flicting beliefs but cannot, beforehand, guarantee whether their effort will be 
successful or not. 
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Abstract. In this article we describe a multi-agent dynamic scheduling 
environment where autonomous agents represent enterprises and manage the 
capacity of individual macro-resources in a production-distribution context. 
The agents are linked hy client-supplier relationships and inter-agent 
communication must take place. The model of the environment, the 
appropriate agent interaction protocol and a cooperative scheduling approach, 
emphasizing a temporal scheduling perspective of scheduling problems, are 
described. The scheduling approach is based on a coordination mechanism 
supported by the interchange of certain temporal information among pairs of 
client-supplier agents involved. This information allows the agents to locally 
perceive hard global temporal constraints and recognize non over-constrained 
problems and, in this case, rule out non temporally-feasible solutions and 
establish an initial solution. The same kind of information is then used to 
guide re-scheduling to repair the initial solution and converge to a final one. 

Keywords. Scheduling, Multi-Agent Systems, Supply-Chain Management. 



1 Introduction 

Scheduling is the allocation of resources over time to perform a collection of 
tasks, subject to temporal and capacity constraints. In classical/Operations Research 
(OR) scheduling approaches, a centralized perspective is assumed; all problem data 
is known by a central entity and scheduling decisions are taken by the same entity, 
based on a well defined criteria. Sometimes, in more modern Artificial Intelligence 
(AI), or mixed OR/AI, based approaches, the same kind of centralized perspective is 
assumed too. For a general introduction to OR approaches to scheduling problems 
see [9] or [2]; AI based approaches can be found in [5], [25] or [lo], for instance. 
Planning and coordination of logistics activities has been, in the areas of 
OR/Management Science, the subject of investigation since the sixties [8]. The 
problem of scheduling in this kind of environments has had, recently, a more 
dedicated attention; see [6], [1], [11] or [15], for instance. In this article, the specific 
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logistics context of the supply-chain/Extended Enterprise (EE) [14] is considered, 
for the short-term activity of scheduling of production-distribution tasks. The EE is 
usually assumed to be a kind of cooperative Virtual Organization, or Virtual 
Enterprise, where the set of inter-dependent participant enterprises is relatively 
stable; for concepts, terminology and classification see [ 3 ]; in [ 4 ] other approaches to 
scheduling in this kind of context can be found. A distributed approach is more 
natural in this case, because scheduling data and decisions are inherently distributed, 
as resources are managed by individual, geographically decentralized and 
autonomous entities (enterprises, organizations). So, in our approach, we adopted 
the AI Multi- Agent Systems paradigm (see [13] or [ 24 ]). In the following, we 
describe ongoing investigation developing from work published on the subject of 
multi-agent scheduling in production-distribution environments (see [ 16 ], [ 17 ], [ 18 ], 
[ 19 ], [ 20 ] and [ 21 ]). 

The scheduling problems we consider have the following features: 

a) Communication is involved - Agents must communicate (at least to exchange 
product orders with clients/suppliers); 

b) Cooperation is involved - Each agent must cooperate so that it won’t invalidate 
feasible scheduling solutions; 

c) Scheduling activity is highly dynamic - Problem solution development is an 
on-going process during which unforeseen events must always be 
accommodated, and re-scheduling and giving up a scheduling problem are 
options to be considered. 

The following sections present: a brief description of the model of the 
multi-agent scheduling environment (sec. 2), the agent interaction protocol (sec. 3), 
the cooperative approach proposed for multi-agent scheduling problems (sec. 4), the 
initial scheduling step (sec. 5) and the re-scheduling (sec. 6), both from an individual 
agent perspective, and finally, future work and conclusions (sec. 7). Secs. 5 and 6 are 
presented with examples based on simulations. 



2 The Scheduling Environment Model 

In past work (referred above) we have proposed a model of the EE scheduling 
environment based on an agent network, with each agent managing an aggregate 
scheduling resource, representing a production, a transportation, or a store resource, 
and linked through client-supplier relationships. A scheduling resource is just an 
individual node of a physical network of resources, and accommodates the 
representation of the agent tasks scheduled and the available capacity along a 
certain scheduling temporal horizon. Ordinary network agents are named capacity, 
or manager, agents, because they are responsible for managing the capacity of a 
resource, and they can be production, transportation or store class agents; 
production and transportation class agents are grouped under the processor agent 
class, because the capacity they manage is based on a rate. A special supervision 
agent, with no resource, plays the role of an interface with the outside, and can 
introduce new scheduling problems into the agent network. 
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Pairs of client-supplier capacity agents can communicate, basically through the 
exchange of product request messages (see next section), which contain product, 
quantity of product and proposed due-date information. The supervision agent 
communicates with special agents playing the roles of retail and raw-material 
agents, located at the downstream and the upstream end of the agent network (which 
are pure clients and pure suppliers for the network), respectively. 

A scheduling problem is defined by a global product request from the outside of 
the agent network (i.e., a request of a final product of the network), the global 
due-date DD, and the global release date RD. These two dates are the limits of the 
scheduling temporal horizon and are considered the hard global temporal constraints 
of the problem. The supervision agent introduces a new scheduling problem by 
communicating a global product request from outside to the appropriate retail agent 
(networks can be multi-product so, the set of products deliverable can depend on the 
retail agent); later, after the capacity agents have propagated among them, upstream 
the agent network, the necessary local product requests, the supervision agent will 
collect global product requests to outside from raw-material agents. A scheduling 
problem will cease to exist in the network when the time of the last of the local 
due-date comes, or if some local requests are rejected, or accepted and then 
canceled (the supervision agent knows this as messages of satisfaction, rejection and 
cancellation will be propagated to retail and raw-material agents and then 
communicated to it). 

In order to satisfy a product request from a client a capacity agent must schedule 
a task on its resource. A task needs a supply of one (in the case of a store or 
transportation agent), or one or more products (in the case of a production agent and 
depending on the components/materials for the task product). For those supplies the 
agent must send the appropriate product requests to the appropriate suppliers.* The 
task consumes a non-changing amount of resource capacity during its temporal 
interval. The duration of the task depends on the product, the quantity of product 
and additional parameters related to the characteristics of the resource and, in the 
case of processor agents, to the amount of capacity dedicated to the task.^ The 
details of these latter parameters are omitted here to allow simplicity of explanation, 
and we will assume a non-changing duration for all tasks, except for store tasks 
(with flexible duration, and minimum of 1 time unit). 

Although tasks are private to the agents (only the communication messages are 
perceived by more than one agent), we can view the set of tasks that some agents of 
the network schedule to satisfy a global product request from outside, as whole, as 
belonging to a network job, see example in Fig. 1. This is a just an analogue of the 
concept of job used in classical production scheduling problems. 



* We assume that there is, for each capacity agent, a unique supplier for each supply product. 
As a result of this simplifying assumption, the lack of a product supply has, as a final 
result, the network being unable to satisfy a global product request from the outside. 
Allowing multiple suppliers for the same supply product opens the door to another issues 
(like choosing the preferred supplier, possibly with negotiation based on prices, or 
due-dates), in which we are not interested, for now. 

^ Basically, more capacity invested gives a shorter task duration. 
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A solution for a scheduling problem is a set of product requests agreed by pairs of 
client-supplier agents and the set of agent tasks, necessary to satisfy the global 
product request given by the problem, forming the corresponding network job. A 
feasible solution has nor temporal nor capacity conflicts, i.e., it respects both all 
temporal and all capacity constraints. For capacity constraints to be respected, no 




i,r); P, T and s denote production, transportation and store tasks, respectively. 



capacity over-allocation must occur with any task of the solution, for any agent, at 
any moment of the scheduling temporal horizon. For temporal constraints to be 
respected, all local product requests of the solution must fall within the global 
release date and global due-date of the problem; also, for each agent, the interval of 
its task must fall in between the due-date agreed for the client request and (the latest 
of) the due-date(s) agreed for the request(s) made to the supplier(s), and the latter 
due-date(s) must precede the former. 

More details on the resources and the physical network are given in [ 16 ] and [ 18 ]; 
about the agent network and agent architecture see [ 17 ], [ 18 ], and [ 19 ]. 



3 The Agent Interaction Protocol 

In this section we expose the high level inter-agent protocol used for scheduling in 
the EE network. 

The agent interaction activity for scheduling occurs through the interchange of 
messages, of predetermined types, between pairs of client-supplier agents. The 
exchange of a message always occurs in the context of a conversation between the 
sender and the receiver agents. A conversation has a conversation model, which 
contains information about the predetermined sequences of the types of messages 
exchangeable and is defined through a finite state machine. An interaction protocol 
is defined as a set of conversation models. Eor the interaction of capacity agents we 
defined the Manager-Manager interaction protocol, represented in Eig. 2. 
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a) Request-from-Client^ 
conversation model. 



transitions {message types): 

iceive / send 



request - product rejection - rejection of a 
request, sent by a previously received product 

client agent to a request, sent by the supplier to 
supplier agent. the client. 

, acceptance - acceptance of 

c) Message , • j j . 

' ® a previously received product 

types and request, sent by the supplier to 

description. iheclient. 



re-request - 

re-scheduling request, 
sent by the supplier 
(client) to the client 
(supplier), asking to 
re-schedule a 
previously accepted 
product request to a 
given due-dale. 



re-rejection - rejection of a 
previously received re-scheduling request, 
sent by the receiver to the sender of the 
re-scheduling request. 

re-acceptance - acceptance of a 
previously received re-scheduling request, 
sent by the receiver to the sender of the 
re-scheduling request. 



cancelation - signals giving up a 
previously accepted product request, sent by 
the supplier (client) to the client (supplier). 



satisfaction - signals delivery of a 
previously accepted product request, sent by 
supplier to client at the time of the due-date 
of the product request. 



Fig. 2. Conversation models and messages for the Manager-Manager interaction protocol. 

The protocol has the associated conversation models Request-from-Client and 
Request-to-Supplier, described as finite state machine diagrams in Fig. 2-a and Fig. 
2-b, and to be used by an agent when playing the roles of a supplier and a client 
agent, respectively. Fig. 2-c describes the types of messages exchangeable. 



4 A Cooperative Multi-agent Scheduling Approach 

Classically, scheduling is considered a difficult problem [?]. In general, solutions for 
a scheduling problem have to be searched for, and the search space can be very 
large. Additionally, for a multi-agent scheduling problem, a part of the effort is 
invested in coordination of the agents involved which, in our case, means sharing 
information through message exchange. Message exchange is considered costly so, 
methods of pruning the search space for finding at least a first feasible solution with 
minimal coordination efforts are considered satisfactory. 

The approach we propose in this article, for the cooperative individual (agent) 
scheduling behavior, is a minimal approach, that is, an agent viewing a scheduling 
problem solution as feasible won't do anything respecting to the scheduling problem. 
A first version of this approach appeared in [20]; in [21] we presented a refined 
approach only for processor, i.e., production or transportation, agents; in the present 
article we cover also store agents and, additionally, include the respective minimal 
re-scheduling actions for processor and store agent cases. 

Consider two sets of solutions for a scheduling problem: the set of time-feasible 
solutions, which respect all temporal constraints, and the set of resource-feasible 
solutions, which respect all capacity constraints. A feasible solution is one that is 
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both time-feasible and resource-feasible so, the set of feasible solutions is the 
intersection of those two sets. A problem is temporally over-constrained if the set of 
time-feasible solutions is empty, and is resource over-constrained if the set of 
resource-feasible solutions is empty. If a problem has both non-empty time-feasible 
and resource-feasible solution sets, and their intersection is non-empty, then the 
problem has feasible solutions. We propose an approach using the following three 
step procedure, for each individual capacity agent: 

Step 1. Acceptance and initial solution - Detect if the problem is temporally 
over-constrained, and if it isn’t, establish an initial solution, and proceed in the 
next step; if it is, terminate the procedure by rejecting the problem, because it has 
no feasible solution; 

Step 2. Re-schedule to find a time-feasible solution - If the established solution is 
time-feasible, proceed in the next step; if it isn’t, re-schedule to remove all 
temporal conflicts; 

Step 3. Re-schedule to find a feasible solution - For a resource-feasible solution, 
terminate the procedure; otherwise, try to re-schedule to remove all capacity 
conflicts without creating temporal conflicts, resorting to cancellation, with task 
un-scheduling, as a last choice, if necessary. 

As some approaches in the literature, this procedure starts by establishing one 
initial, possibly non-feasible, solution which is then repaired in order to remove 
conflicts; see, for instance, [12]. Steps 1 and 2 of the procedure are oriented for a 
temporal scheduling perspective and concern only to a single scheduling problem of 
the agent. Step 3 is oriented for a resource scheduling perspective and can involve 
all scheduling problems of the agent at step 3, as all tasks of the agent compete for 
the agent resource capacity. 



5 Step 1: Scheduling an Initial Solution 

We now show, through examples for processor and store class agents, how an agent 
can locally recognize a non temporally over-constrained problem and, in that case, 
contribute to establish an initial solution (step 1). 

For a processor agent, suppose an agent g? has a scheduling problem with the 

7 

processor task of network job in Fig. 1. In Fig. 3-a, a possible situation for 

this task (which also represents a feasible solution for the problem) is represented on 
a timeline. As g^ has two suppliers for the task, two requests to two suppliers are 

7 8 7 9 4 7 

shown, and besides the request from the client, Intervals 

(denoted by Ih and IHI symbols) and temporal slacks are shown in Fig. 3-a. Symbols 
fij, fim, FEJ, FEM, FJ and FM denote, respectively, the internal downstream 
slack, internal upstream slacks, external downstream slack, external upstream 
slacks, downstream slack and upstream slacks. For each kind of upstream slacks 
there is one per each supplier (slacks are represented by arrows). By definition: 
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FJi^i4=FEJi^i4+fij fij i^i4=TIME((dli;i4) -ENDTIME 

FM i = FEM ’ + f im ’ , f im 3 = STARTTIME ( O ’ _ - T IME ( (dl 3 J 

(j=8,9) 

Internal slacks are inserted locally, by the initiative of the agent, when scheduling 
the task and making requests to suppliers; external slacks are imposed by the other 
agents of the network. It is assumed that, in any case, the agent will maintain non 
negative internal slacks. Each of the Id's is an interval between one of the supplier 
due-dates and the client due-date (13 and 19, and 12 and 19, in Fig. 3-a); each of the 
IHI's is an interval between one of the earliest start times and the latest finish time for 
the task (10 and 21, and 11 and 21, in Fig. 3-a). Each of the latter pairs of temporal 
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a) Processor scheduling problem. 
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b) Store scheduling problem. 



Hill 



Fig. 3. Scheduling problem parameters: a) for processor agent 97 , and b) for store agent gi 
(for the values in the timelines of these two cases no relationship is intended). 



points are hard temporal constraints for one of the former pairs of due-dates. Also, 
the temporal end points of the most restrictive IHl interval (11 and 21 in Fig. 3-a) are 

7 7 

hard temporal constraints for the task; let us denote these by RD ^ and DD ^ 
for the upstream and downstream point, respectively. 
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It is easy to see that, in order for a solution to be time-feasible, the interval of the 
task must be contained in the most restricted Ihi interval, and each Ihi interval must be 
contained in the corresponding (same supplier) IHI interval. For this to hold, no 
temporal slack can be negative. Also, if the duration of the most restrictive IHI 
interval is less than the task duration, the problem is temporally over-constrained, 
and the agent can reject it. 

We propose that product request messages from the client, additionally to product 
request information, carry the value of the FEJ slack, also, request acceptance 
messages from suppliers will carry the value of the respective FEM slack. In our 

7 7 

example, agent g? would then calculate the DD ^ and RD ^ values by: 

Dd’ i4=TIME((di'i4)+FEJi 14 and RD ’ ^4 = MAX (RD ’'^4 ) 

' ' ' ' j=8,9 

where: RD4fi4=TIME ((di;^! ) -FEMi;\4 (j= 8 , 9 ) 

When the agent receives request d \ from the client, it will, guaranteeing 

non-negative values of fij and fim for the task to be scheduled, make requests 
di'44 and di'i4 to suppliers, passing them also the (supplier FEJ) value 

7 7 i 

FJi 14+f irrii 14 (for j = 8 , 9). When the agent receives all the request acceptances 
from the suppliers, verifies first if the problem is temporally over-constrained, by 
testing if DD 1 14 - RD 1 14 < DURATION (O J 14 ) . If this is true the problem must 
be rejected. Otherwise, the agent will send the acceptance message to the client, 

7 7 

passing it also the (client FEM) value FMi 14+fij 1 14, where 
FMi i 4=STARTIME (Oi 14 ) -RDi 14. If step 1 concludes with a non temporally 



over-constrained problem, agent 

<ddI,i 4, {RDi;i4,RDi;®4},dl;i4 



g? will internally keep the tuple 

7 8 7 9 7 

{ d 1 ' 14 , d 1 ' 14 } , 0 1 14 >, which represents 



the agent local perspective of the scheduling problem, and also includes (a part of) 
the initial solution. 

For a store agent, suppose an agent gi has a scheduling problem with the store 
task O * j4 of network job in Fig. 1 . In Fig. 3-b, a possible situation for this task 



(which also represents a feasible solution for the problem) is represented on a 
timeline. The case is similar to the one for agent g?, with the exceptions described 
in the following. There is only one request to a supplier, as gi is a store agent. The 
internal slacks are defined differently: the task interval is equal to the (unique) Ihi 
interval, and part of the task duration is considered as internal slack (if its duration is 
greater than 1, which is the minimum assuming the task must exist), with the 

relationship f i j * ^4 +f im* ^4 =DURATION (o} ^4 ) -1 always holding. Fig. 3-b 

suggests symmetrical definitions for fij and fim slacks, with the minimum 
duration interval "centered" in the Ihi interval but, for purposes of temporal constraint 
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violation identification (see next section), the following definitions must be used. 
For the downstream side violations (cases 1 and 3, in the next section), f ij and 
f im are defined by (the minimum duration interval is shifted to the left): 

fij }_i 4 =DURATION(o}i 4 ) - 1 , and fim| i 4=0 

For the upstream side violations (cases 2 and 4, in the next section), fij and 
f im are defined by (the minimum duration interval is shifted to the right): 

fij} 44 = 0 , and f im}_i 4 =DURATION (O }_j 4 ) - 1 

7 7 

The problem is temporally over-constrained if DD ^ 44 -RD ^ 44 < 1. The values 

of FEj}j 4 +DURATION(o}i 4 ) -1 and FEm} 44 +DURATION (o| 44 ) -1 must be 

passed to the supplier and to the client (as the supplier FEJ value, and the client 
FEM value), respectively. Finally, if step 1 concludes with a non temporally 

over-constrained problem, agent gi will keep the tuple <Dd} 44 , {rd* 44 } ,(d}^j 4 , 
{di44 } d0»l44>. 

6 Step 2: Re-scheduling for a Time-Feasible Solution 

In this section we show how, starting from an initial solution with temporal 
conflicts, agents g 7 and gi can locally contribute to obtain a time-feasible solution 
(step 2 ). 

For the local situations represented in Fig. 3-a and Fig. 3-b, all slacks are positive 
so, the solution is seen as temporally-feasible (by agent g?, and agent gi, 
respectively). In these cases, an agent will do nothing, unless it receives any 
re- scheduling request, which it could accept provided non-negative internal slacks 
can be maintained, for a processor agent, or a task with duration greater than 0 is 
possible, in the case of a store agent. Otherwise, four kinds of possible local 
situations can occur where the agent itself must take the initiative of some 
re-scheduling actions. 

The situations referred are described by the re-scheduling cases 1, 2, 3 and 4, for 
which we show examples in Fig. 4 for processor agents (for agent gv), and in Fig. 5 
for store agents (for agent gi). Each figure represents, for each case: a) the situation 
detected, and b) the situation after a minimal re-scheduling action. No relationship 
is intended among timeline values of the processor and the store agent cases. In the 
text following, upper indexes are omitted in slack symbols in order to cover both, 
processor and store, agent cases. Cases 1 and 2 must be considered first, by the 
agent. 

Case 1 occurs if F J 4 44 < 0 and FEJ 1 44 < 0 (the task and the client request violate 

the hard temporal constraint downstream, 17 in Fig. 4-1-a, and 20 in Fig. 5-1-a). The 
detection of case 1 must be followed by the appropriate task re-scheduling and client 
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request re-scheduling to earlier times (resulting in the situation shown in Fig. 4-1-b, 
and Fig. 5-1-b). Re-scheduling of some requests to suppliers can (or cannot) then be 
necessary to maintain non-negative f im slacks, at the upstream side. 

Case 2 occurs if, for some supplier, FM^j 4<0 and FEM^j 4 < 0 , (the task and 




Fig. 4. Examples of re-scheduling cases 1, 2, 3 and 4, for a processor agent, with situations 
before, and after, minimal re-scheduling. 

some requests to suppliers violate hard temporal constraints upstream, 16 in Fig. 

4- 2-a, and 29 in Fig. 5-2-a). The detection of case 2 must be followed by the 
appropriate task re-scheduling and the re-scheduling of the offending requests to 
suppliers to later times (resulting in the situation shown in Fig. 4-2-b, and Fig. 

5- 2-b). Re-scheduling of the client request can (or cannot) then be necessary to 
maintain a non-negative f i j slack, at the downstream side. 

After handling cases 1 and 2, cases 3 and 4 are handled. 

Case 3 occurs if F J ^ < 0 (the client request violates the hard temporal 
constraint downstream, 19 in Fig. 4-3-a, and 26 in Fig. 5-3-a). The detection of case 
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Fig. 5. Examples of re-scheduling cases 1, 2, 3 and 4, for a store agent, with situations 
before, and after, minimal re-scheduling. 

3 must be followed by the appropriate client request re-scheduling to an earlier time 
(resulting in the situation shown in Fig. 4-3-b, and Fig. 5-3-b). 

Case 4 occurs if, for some supplier, FEM 1 14 < 0 (some requests to suppliers 

violate hard temporal constraints upstream, 13 in Fig. 4-4-a, and 23 in Fig. 5-4-a). 
The detection of case 4 must he followed by the appropriate re-scheduling of the 
offending requests to suppliers to later times (resulting in the situation shown in Fig. 
4-4-b, and Fig. 5-4-b). 
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7 Conclusion and Future Work 

We described a multi-agent dynamic scheduling environment involving 
communication and cooperation, and an approach for multi-agent cooperative 
scheduling based on a three step procedure for individual agents. Step 1 allows 
agents to detect locally if the problem is temporally over-constrained and, in the 
case it isn't, schedule an initial, possibly non time-feasible, solution. By locally 
exchanging specific temporal slack values, agents are able to locally perceive the 
hard global temporal constraints of a problem, and rule out non time-feasible 
solutions in the subsequent steps. Each of the slack values exchanged in step 1 
corresponds, for a particular agent, to a sum of slacks, downstream and upstream the 
agent network, and so, they cannot be considered private information of any agent in 
particular. In step 2, if necessary, agents repair the initial solution to obtain a 
time-feasible one. 

The procedure is very general with respect to its step 3. This step can be refined 
to accommodate additional improved coordination mechanisms for implementing 
certain search strategies, based on capacity/resource constrainedness {e.g., see [ 23 ] 
or [22]), leading the agents on a fast convergence to specific feasible solutions. For 
instance, feasible solutions satisfying some scheduling preferences or optimizing 
some criteria, either from an individual agent perspective, or from a global one, or 
both. This is a subject for our future work. 
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Abstract. Electronic transactions are of increasing use due to its openness and 
continuous availability. The rapid growth of information and communication 
technologies has helped the expansion of these electronic transactions, 
however, issues related to security and trust are yet limiting its action space, 
mainly in what concerns business to business activity. This paper introduces an 
Electronic Institution framework to help in electronic transactions management 
making available norms and rules as well as monitoring business participants’ 
behaviour in specific electronic business transactions. Virtual Organisation 
(VO) life cycle has been used as a complex scenario encompassing electronic 
transactions and where Electronic Institutions help in both formation and 
operation phase. A flexible negotiation process that includes multi-attribute and 
learning capabilities as well as distributed dependencies resolution is here 
proposed for VO formation. “Phased commitment’’ is another concept here 
introduced for VO operation monitoring through the Electronic Institution. 



1 Introduction 

An Electronic Institution is a framework for enabling, through a communication 
network, automatic transactions between parties according to sets of explicit 
institutional norms and rules. 

An Electronic Institution helps on both providing tools and services for and on 
supervising the intended relationships between the parties. 

Usually, parties engaged in electronic transactions and joint actions are software 
agents mediated. We therefore believe that an appropriate Electronic Institution can 
be implemented as an agent-based framework where external agents can meet 
together according to a set of established and fully agreed mutual constraints. 

A good example illustrating the need of an Electronic Institution can be found in a 
software framework providing the automatic services needed for helping on the 
Virtual Organisations’ life cycle. 

A Virtual Organisation (VO) is an aggregation of autonomous and independent 
organisations connected through a network (possibly a public network like Internet) 
and brought together to deliver a product or service in response to a customer need. 
Virtual Organisation management should be supported by efficient information and 
communication technology through its entire life cycle. 
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Tools for Virtual Organisations formation process, through the use of an Electronic 
Market providing enhanced protocols for appropriate negotiation can easily be 
accommodate into the Electronic Institution available services. In our approach, these 
services include automatic capabilities for adaptive bid formulation, accepting a 
qualitative feedback for the sake of keeping the information as much as possible 
private to each one of the negotiating agents, as well as multi-issue bid evaluation. 

Contrary to other approaches [1, 2] that use a-priori fixed values for weighting 
attributes’ values in the bid evaluation function, we here advocate weighting those 
values, reflecting the deviation from the preferred values, according to the relative 
importance of the attributes. Further more, the negotiation protocol we are 
introducing, here called Q-Negotiation, includes the capability for Agents that are 
trying to supply mutually constrained items, to deal with the respective inter- 
dependencies while keeping their self-interestedness. 

During the self-interested agents Q-Negotiation process, they may have to agree on 
supplying sets of mutually constrained items whose values, although not being 
optimum for each one of the individual agents, correspond to a minimal joint 
decrement of the agents’ maximum utility. However, it would not be fair that this 
agreement, in the name of the best possible joint utility, benefits one agent more than 
the other. We therefore propose that, in the case of agreements based on the agents’ 
joint utility, the agents concerned should equally distribute the joint decrement of 
their maximum utility. Through the calculation of appropriate compensations, agents 
that have the most beneficial, know they have to transfer some utility to those who 
have the least. 

The outcome of the Virtual Organisation formation stage is, for our proposed 
Electronic Institution, a “phased commitment” through which different parties 
commit themselves to specific future actions. We intend to link commitments, which 
are the result of previous negotiation during the Virtual Organisation formation stage, 
to the next stage (VO operation) through a monitoring process. 

We also introduce the concepts of both Meta Institution and Electronic Institution 
which is the responsible for making available functionalities including an adaptive 
multi-criteria based negotiation protocol together with features for solving agents 
mutual dependencies. 

This paper includes, besides the introduction, three more sections. The next section 
introduces the concepts of Electronic Institutions and Meta Institutions in the context 
of Virtual Organisations. A third section details our proposed Q-Negotiation 
algorithm used in the VO formation stage. Section 4 describes phased commitments 
in the VO operation stage and, finally, section 5 gives some conclusions and 
directions for future work. 
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2 Electronic Institutions and Meta Institutions 



2.1 Meta-Institutions 

It seems intuitive that when automated agents’ interactions become more 
sophisticated and agents’ autonomy more evident, a problem related with confidence, 
trust and honesty may arise. Moreover, agreements like deals made by different 
companies or their delegates (automated or not) always claim for a common and non- 
ambiguous ground of understanding. We need to design a framework, accepted by all 
the parties, to encompass all the automatic activities that are going to take place 
between agents representing different individual or collective entities. 

An Electronic Institution is a framework for enabling, through a communication 
network, automatic transactions between parties according to sets of explicit 
institutional norms and rules. 

An Electronic Institution helps on both providing tools and services for and on 
supervising the intended relationships between the parties. 

However, each Electronic Institution will be, at least partially, dependent on the 
specific application domain of business it has been designed for. Our proposal starts 
with the definition of a Meta-Institution, which has to be independent of the 
application domain. 

We see a Meta-Institution as a shell for generating specific Electronic Institutions. 
A Meta-Institution is a set of Electronic facilities to be requested and used in order to 
help on the creation of suitable Electronic Institutions according to a set of established 
social rules and norms of behaviour. Those rules of behaviour should apply to many 
different kinds of (automatic) entities interacting through the web. Besides enforcing 
those general rules into the social interactions, a Meta-Institution also provides tools 
for some important stages of the interaction process. 

The main goal of a Meta-Institution is, however, to be able to make available 
suitable Electronic Institutions that will leave all along a particular business process in 
a specific application domain. 

Electronic transactions between distributed and autonomous entities are becoming 
more and more software agents mediated. We therefore believe that an appropriate 
Electronic Institution can be implemented as an agent-based framework where 
external agents can meet together according to a set of established and fully agreed 
norms, rules, and mutual constraints. 



2.2 Virtual Organisations and Electronic Institutions 

A good example illustrating the need of an Electronic Institution (El) can be found in 
providing the automatic services needed for helping on the Virtual Organisations’ life 
cycle. 

This aggregation of autonomous organisations is advantageous in the sense that it 
will reduce complexity - today’s products and services are increasingly complex and 
require close coordination across many different disciplines - and most important. 
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will enable the response to rapidly changing requirements. Virtual Organisation will 
only exist for a temporary time duration, that is the time needed to satisfy its purpose. 

The VO life cycle is decomposed in four phases [3, 4] that will also be reflected in 
the El framework: 

1. Identification of Needs'. Appropriate description of the product or service to be 
delivered by the VO, which guides the conceptual design of the VO. 

2. Formation (Partners Selection): Automatic selection of the individual 

organisations (partners), which based in its specific knowledge, skills, resources, 
costs and availability, will integrate the VO. 

3. Operation: Control and monitoring of the partners’ activities, including resolution 
of potential conflicts, and possible VO reconfiguration due to partial failures. 

4. Dissolution: Breaking up the VO, distribution of the obtained profits and storage of 
relevant information for future use of the Electronic Institution. 

We expect from a Meta-Institution, the application independent framework we 
have defined in section 2.1, the capability to directly help on the “Identification of 
needs” stage of the VO life cycle, as follows: 

- Eirst helping agents in describing their needs in such a way that can be understood 
by other potential members that may join later a specific and appropriate Electronic 
Institution and 

- Second, providing the searching tools to look for potential partners who know how 
to achieve those described needs. 

The Meta-Institution will then generate that specific Electronic Institution for the 
particular application domain through the instantiation of some modules according to 
the explicit VO goals. 

We can see both the general architecture and the role of a Meta-Institution in the 
light of the emergence of Virtual Organisations as it is depicted in the figure below: 






Fig. 1. General architecture and role of a Meta-Institution 

The Electronic Institution, inherits appropriate rules as well as important links to 
other institutions that may play a crucial role for all the VO process, that are inherited 
from the Meta-Institution, will provide the framework for dealing with, at least, the 
two main stages of the VO life cycle: VO formation and VO operation monitoring. 

We will describe in detail, in the next two sections, how an Electronic Institution 
provides the means for helping on those two stages by making available an electronic 
market agent-based tool. 
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Figure 2, below, shows the main modules of an Electronic Institution. Figure 2 also 
shows that VO formation module makes available our Q-Negotiation protocol 
enabling several organization agents (OAgt) to meet together through a Market 
(MAgt) leading to the VO partners selection. VO operation module uses the phased 
commitment mutually agreed by the partners at the end of previous stage for 
operation monitoring. The final module (VO dissolution) is mentioned here only for 
purposes of completeness. 




Fig. 2. General architecture and role of an Electronic Institution 



3 Advanced features for Negotiation in VO formation 

In the scenario of an agent-based Virtual Organisation formation process, the 
negotiation mechanism should enable the selection of the individual organisations 
that, based on their own competencies and availability, will constitute the optimal 
group to satisfy the previously described VO needs. In such a scenario, the adopted 
automatic negotiation mechanism has to be powerful enough to satisfy three 
important requirements: 

• Capabilities for negotiating about what are the most promising organisations that 
should belong to VO. This means that agents, as representative of organisations, 
have to negotiate over goods or services those organisations are able to provide. In 
realistic scenarios, goods/services are described through multiple attributes, which 
imply that the negotiation process must be enhanced with the capability to both 
evaluate and formulate multi-attribute based proposals. 

• Agents that are willing to belong to the Virtual Organisation may compete between 
them. An agent does not know other ones’ negotiation strategies or even current 
proposals. Learning techniques can help agents to better negotiate in these partially 
unknown environments, by reasoning about past negotiation episodes as well as 
about the current one improving its own behaviour. 

• In the VO formation process, each one of the individual organisations will 
contribute with at least one of its own capabilities (good or service) to the VO. All 
these contributions may be, and they usually are, mutually dependent. The 
negotiation process has to be able to deal with those inter-dependencies, reaching a 
coherent solution as the final one to be accepted by all the agents. 
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An important characteristic that must he considered in VO scenario is the fact that 
any organisation has as its main objective to maximize its own profit. In order to do 
that, negotiation process has to take into account agents individual rationality and 
self-interestedness. Keeping information private prevents loosing negotiation power 
to competitors, since others will never know or be able to deduce how close they are 
to another agent’s preferences. 

It is our claim that our proposed negotiation algorithm can effectively deal with 
these three important requirements in the VO scenario while keeping agents’ 
information private as much as possible. In the next sections, we further detail the 
proposed model for agents’ negotiation. First, we present a formal description of our 
negotiation model. Then we describe how to specifically deal with each one of the 
three requirements mentioned above (multi-attribute bidding, adaptive bidding and 
mutual dependencies resolution). 



3.1 The Negotiation Model 

In the VO formation process, participants in the negotiation can be either market or 
organisation agents. The Market Agent plays the role of organizer, meaning that it is 
the agent that starts and guides all the negotiation process. The Organisation Agents 
play the role of respondents, meaning that they are those who are willing to belong to 
the future VO and, therefore, they have to submit proposals during the negotiation 
phase. 

In order to agree in a VO structure, agents (Market and several Organisations) 
naturally engage themselves in a sequential negotiation process composed of multiple 
rounds of proposals (sent by Organisations to Market) and counter-proposals which 
are actually comments to past proposals (sent by Market to Organisations). 

The Market Agent playing a central role as organizer, models the negotiation 
process through the Neg'^^ triplet as follows: 

Neg“^ = <Cmpt, LAgts, H> (1) 

where: 

• Cmpt identifies the component under negotiation 

• LAgts is the list of respondent (organisation) agents that can provide component 
Cmpt. 

• H h the negotiation history. Each element of H contains information related to a 
single negotiation round. Each negotiation round includes all proposals received 
during that round. 

H = {H,l, H, = {<Propt, Eval„>} (2) 



Prop,= {Vi, ...,V„} 

where: 

• Prop, I is the proposal sent by organisation agent i, in the negotiation round t. 

• Vc is the proposal’s value of attribute x. 

• Eval,i is the evaluation value of proposal Prop,,, from the Market Agent point of 
view. 
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Each one of the Organisation Agents model the negotiation process through the 
Neg^^ n-uple as follows: 

Neg®''^ = <Cmpt, MAgt, H, Q> (3) 

• Cmpt identifies the component under negotiation 

• MAgt identifies Market Agent that is the organizer of this negotiation process. 

• H h the negotiation history. Each element of H contains information related to a 
single negotiation round. Each negotiation round includes the proposal sent by 
each specific Organisation Agent to the Market Agent plus the feedback comment 
received from the Market Agent. 

H = {<Propt, Commentt>}, Comment £ {winner, <Evalti, ..., Evaltn>} (4) 

where: 

• Prop, is the proposal sent during the negotiation round t. 

• Comment, is the comment received from Market Agent to proposal Prop,. This 
comment indicates if the proposal is either the winner in the current round 
(winner) or includes a qualitative appreciation for each one of the attribute 
values under negotiation. 

• The Q parameter includes relevant information to be used by the learning 
algorithm used for next bid formulation. This particular topic is discussed in a later 
section. At this point, it is only important to say that Q is described as follows: 

Q = (Qt), Qt = {<State,, Actiout, QValuet >} 



3.2 Multi-Attribute Bid Evaluation 

Negotiation implies, for the VO scenario as well as for most of the economic 
transactions, to take into consideration not only one, but multiple attributes for 
defining the terms (goods/services) under discussion. For instance, although the price 
of any good is an important (perhaps the most important) attribute, about delivery 
time or quality can also be, and generally are, complementary issues to include in the 
decision about to buy/sell or not a specific good. 

Attaching utility values to different attributes under negotiation solves the problem 
of multi- attribute evaluation. Generally, an evaluation formula is a linear combination 
of the attributes’ values weighted by their corresponding utility values. In this way, a 
multi-attribute negotiation is simply converted in a single attribute negotiation, where 
the result of the evaluation function can be seen as this single issue. Examples of this 
method are presented in [1, 2]. 

However, in some cases, it could be difficult to specify absolute numeric values to 
quantify the attributes’ utility. A more natural and realistic way is to simply impose a 
preference order over attributes. The multi-attribute function presented in formula (5) 
encodes the attributes’ and attributes values’ preferences in a qualitative way and, at 
the same time, accommodates attributes intra-dependencies. 
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1 1 " (■ 

Ev = , Deviation = —* '^—*dif( PreJVj,Vj ) , 

Deviation n n 

where: 

n = number of attributes that defines a specific component, 

and 

Vi-PrefVi ^ . 

, ij continuous domain 

max[ — mini 

Pos(Vi)-Pos(PrefVi) ^ . 

, if discrete domain 

nvalues 

A proposal’s evaluation value is calculated by the Market Agent, as the inverse of 
the weighted sum of the differences between the optimal {PrejVi) and the real (Vi) 
value of each one of the attributes. In the formula, each parcel should be presented in 
increasing order of preference, that is, attributes identified by lower indexes are least 
important than attributes identified with higher indexes. The proposal with the highest 
evaluation value so far is the winner, since it is the one that contains the attributes’ 
values more closely related to the optimal ones from the Market Agent point of view. 

The negotiation process is realized as a set of rounds where Organisation Agents 
concede, from round to round, a little bit more trying to approach the Market Agent 
preferences, in order to be selected as partners of the VO. The Market Agent helps 
Organisation Agents in their task of formulating new proposals by giving them some 
hints about the direction they should follow in their negotiation space. These hints are 
given, by the Market Agent, as comments about attributes’ values included in current 
proposals. 

Qualitative Feedback Formulation 

The response to proposed bids is formulated by the Market Agent as a qualitative 
feedback, which reflects the distance between the values indicated in a specific 
proposal and the optimal one received so far. The reason why the Market Agent 
compares a particular proposal with, not its optimal, but the best one received so far, 
can be explained by the fact that it is more convincing to say to an Organisation 
Agent that there is a better proposal in the market than saying that its proposal is not 
the optimal one. 

A qualitative feedback is then formulated by the Market Agent as a qualitative 
comment on each of the proposal’ attributes values, which can be classified in one of 
three categories: sufficient, bad or very_bad. 

Organisation Agents will use this feedback information to its past proposals, in 
order to formulate, in the next negotiation rounds, new proposals trying to follow the 
hints included in the feedback comments. 



' Each attribute Tj may depend on several other attributes through function/. 



dif( PrejVi,Vi ) = 
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3.3 Learning in Bid Formulation 



The Q-Negotiation algorithm uses a reinforcement learning strategy based in Q- 
learning for the formulation of new proposals. The Q-learning algorithm [6] is a well 
known reinforcement learning algorithm that maps evaluation values (Q-values) to 
pairs state/action. 

The selection of a reinforcement learning algorithm seems to be appropriate in the 
negotiation process that conduits to the VO formation, since organization agents 
evolve in an, at least, partially unknown environment. And in particular, Q-learning 
enables on-line learning, which is an important capability in our specific scenario 
where agents will learn in a continuous way during all the negotiation process, with 
information extracted from each one of the negotiation rounds, and not only in the end 
with the negotiation result. 

Q-learning is based in the idea of rewarding actions that produces good results, and 
punishing those that produce bad results, as indicated by parameter r in the 
correspondent formula (see equation (6)). 



Q(s,a) = Q(s,a) + a 



/ \ 
r + ymaxQ(s’,b) - Q(s,a) 
b 



(6) 



s = 



(vi,V2,...,v„) 



In the Q-Negotiation process, we assume that: 

A state is defined by a set of attributes’ values, thus representing a proposal. 

, n = number of attributes 
, '.value of attribute X 

An action is a relationship that is a modification of the attributes’ values through 
the application of one of the functions: increase, decrease, or maintain. 

, n = number of attributes 
, £ {increase, decrease, ma int airi\ 






The adaptation of the Q-learning algorithm to our specific scenario, the VO 
formation negotiation, leads to the inclusion of two important features we will briefly 
enumerate in next paragraphs, and are detailed elsewhere [6]. 

The reward value for a particular state is calculated according to the qualitative 
feedback received from the Market Agent, in response to the proposal derived from 
this state (see formula 7). 



r 



n 



if winner 



]--J. penalty i 



if notwinner (O < penalty i < l) 



(7) 



The exploration space, which can became very large and thus implies a long time 
to learn, is reduced in order to include only those actions that can be considered as 
promising actions. A promising action is an action that can be applied to a previous 
state proposed to the Market Agent hints included in the feedback formulated by this 
agent. As an example, if the Market Agent, as a proposal’s feedback, classifies the 
value of attribute x as bad, one promising action should be increase a little bit this 
attribute and maintain all the others. 
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3.4 Distributed Dependencies Resolution 

One of the requirements for the negotiation protocol we are here proposing, besides 
dealing with attributes intra-dependencies, is the capability to deal with attributes’ 
inter-dependencies. This is an important requirement to be considered in our scenario, 
because in the VO formation process interdependent negotiations take place 
simultaneously, and proposals received from different organisation agents may have 
incompatible dependent attributes’ values. Therefore, agents should negotiate in order 
to agree between them on mutual admissible values, what can be seen as a distributed 
dependencies satisfaction problem. 

The distributed dependencies satisfaction problem has been the subject of attention 
of other researchers, addressing the study of both single [7] and multiple dependent 
variables [8, 9, 10]. In the VO formation process, dependencies may occur between 
multiple variables, making the latter approaches more relevant to our research. The 
first two mentioned papers, [8, 9] describe algorithms to reach one possible solution, 
not the optimal one. The third paper [10] introduces an algorithm that, although 
reaching the optimal solution, imposes that all agents involved in the mutual 
dependencies resolution process have to know all agents’ private utility functions. 

Differently from all these proposals, our distributed dependencies satisfaction 
algorithm, besides reaching the optimal solution, keeps agents’ information as much 
as possible private. 

Each agent involved in the distributed dependent problem resolution should know 
its space of states, that is, all possible values for its own dependent attributes. Agents 
will then exchange between them alternative values for the dependent attributes, in 
order to approach an agreement. As in any iterative negotiation process, agents start 
the negotiation by proposing its optimal (from a local point of view) solution and, in 
the next rounds start conceding trying to reach a consensus. 

In order to properly understand the way the algorithm works, first we should 
introduce the concept of “decrement of the maximum utility” of an alternative state. 
State transitions are due to relaxation of one or more state variables. The decrement of 
the maximum utility of a particular alternative proposal can be calculated as the 
difference between the evaluation values of this alternative proposal and the optimal 
one. We will abbreviate “decrement of the maximum utility” to “decrement of the 
utility” meaning the successive amount of utility agents has to concede compared to 
the (local) optimal bid. Formula (8) represents the decrement of utility for agent i, 
corresponding to the particular state /, where s* is the agent’s optimal state 
(proposal). 




At each negotiation step, the agent selects as a new proposal the one that has the 
lowest decrement of the utility of those not yet proposed. During the negotiation 
process, agents do not reveal their own state’s utility, but only the state’s decrement 
utility, what enables keeping important information private. 

This process ends when all agents cannot select a next state better than one already 
proposed in the past. In this way, agents, although remaining self-interested, will 
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converge for a solution that is the best possible for all of them together, because it 
represents the minimum joint of decrement of the utility. 

The proposed distributed dependencies satisfaction algorithm can be described as 
follows: 

1 . Each agent i select its next preferable alternative state, from those not yet proposed 
before. Let us suppose this is state a. 

duf = min(r/Mj^ ) 

2. Each agent i sends out to others: 

- its own preferable state as a new proposal 

- its own local decrement of the utility for that state 

3. When agent j receives the proposal (state a) from agent i, it calculates: 

- its own local decrement of utility (du‘^) 

- the joint decrement of the utility: 

jdu = dag > ■ . «} set of mutual dependent agents 

dag 

- the minimum joint of the decrement of the utility already known ( jdu ) 

4. and selects: 

- its next preferable state. Suppose it is state b. 

- if du^j < Jdu , agent j proposes state b to other agents 

- else agent j accepts state m as the final proposal and negotiation ends. 



Transfer of Compensations 

After agree in a global solution, agents involved in the dependencies resolution 
process, generally get different local decrement of utility values and, therefore, some 
agents become more penalized than others. In order to guarantee that all agents 
involved in the distributed dependencies resolution get the same real decrement of 
utility (rdu), the joint decrement of the utility will be distributed between them 
according to formula (9): 

Jdu’^ 

rdu = , n = numberof agents 

n 



As a consequence, some agents have to pay or get a compensation value to others. 
Once agent i has previously calculated duf as its local decrement of utility, the 
compensation value is calculated according to formula (10). 

cValue,- = rdu — duf 



If the agent’ real decrement of the utility is greater than its local decrement of the 
utility, it will pay a compensation value to others, that has calculated as the difference 
of these two values. If not, the agent will get a compensation value. 
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4 Phased Commitments 

We have seen how agents, representing individual autonomous enterprises, may reach 
an agreement through appropriate negotiation procedures. The contract that 
formalises that agreement should explicitly state all the commitments that those 
agents are due to satisfy all along the VO life cycle. 

A full commitment contract may be unable to deal with possible future, partially 
unexpected, events. This fact has already been recognized since the definition of the 
old contract net protocol [11] where the possibility of a contract cancellation was 
envisaged. More recently, other authors like [12] have approached this subject in the 
context of decommiting in the meeting scheduling application. However, it was 
Sandholm [13, 14] who gave a more systematic and relevant contribution for this 
issue through the introduction of the concept of “leveled commitment” and associated 
penalties. Contrary to the game theoretic approach where contingency contracts are 
established according to the existence or not of future events, Sandholm [14] allows 
unilateral decommitments through the payment of calculated penalties. Resulting 
contracts are then called “leveled commitment contracts”. 

Three main aspects are related to this issue: 

• First, the problem of how to represent, in an unambiguous form, such a 
commitment including all relevant information about future agents’ attitudes. 

• Second, how to explore this knowledge in order to correctly monitoring the 
next stages of the VO life cycle? 

• Third, what to do in case of failure of what was previously negotiated and 
agreed and finally stated in the accepted contract. Can parties back out? In 
what circumstances can that happen? And, if the answer is yes, what 
procedure should follow? Should a new negotiation process start or (and) 
should appropriate penalties be enforced on the agents? 

Commitments have to be agreed and represented in such a way that at several 
different future points in time, they can be verified. Which procedure to follow after 
that verification should also be considered in the agreed contract. 

We envisage representing the negotiation outcome (the agreed contract) as a kind 
of frame where each slot represents pre-conditions plus a set of rules to be selected for 
possible application. Those rules whose conditional part has been verified indicate the 
appropriate action to be taken in those specific circumstances. 

A contract, including a set of phased commitments, can be represented as: 

Contract = ^ LAgts, Agts — PIS, Verification _ procedure 'j , where: 

• LAgts represents the list of agents that accept that contract. 

• Agt-P/S ties each agent together with the contribution (product or service) is 

commited to give to the VO. 

Agt -P / S = {Agt -P / Si }, Agt - P / Si = { {Agti ,Prod/ Servi )} 



Verification ^procedure indicates how and when to monitoring the operation 
procedures agreed through the contract. Verification _procedure is represented as: 
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Verification_procedure=<Pre-cond, Rule-set>, where: 

• Pre-cond& {event, time _point} 

• event is a specific type of arrived messages. 

• time _point is a pre- specified point in time for checking current conditions. 

• Rule-set = ( <condi, actioni > / 

• condi is a set of conditions to he checked after Pre-cond is true. 

• action^ £ (dpenalty, dpenalty + re_negt, noactj, and: 

- dpenalty represents a decommitment penalty value. 

- re_negt represents the re-negotiation action. 

- noact represents the case where no action is to be done. 

Decommitment penalties have to be calculated according to the other VO partners’ 
respective losses. It is our intention to enhance the negotiation protocol in such a way 
that the agents already know these penalties at the end of the negotiation phase. 



5 Conclusions and Discussion 

Electronic Institutions are general frameworks for helping in collaborative work in 
electronic environments. Electronic Institutions provide, sometimes enforce, rules and 
norms of behaviour and make available service facilities supporting both interaction 
and operation monitoring of computational entities. 

A Virtual Organization is a powerful example of the need of such collaborative 
work, once different enterprises have to join together, temporarily, to achieve a 
common business oriented goal. This paper elaborates on how Electronic Institutions 
can effectively help during the VO life cycle. 

Eor the VO formation stage we have introduced a new negotiation algorithm, 
called Q-Negotiation, which includes appropriate features for dealing with the 
specific requirements of the VO scenario. An important requirement in the VO 
scenario, is that information must be kept private to individual enterprises, since they 
are competitive by nature and do not want to reveal their market strategy to others. 
The Q-Negotiation algorithm has the ability to maintain information private to 
individual enterprises, and at the same time, includes the capability to evaluate multi- 
attribute proposals, to learn during the negotiation process, and to resolve attributes’ 
inter dependencies. Let us discuss each one of these features separately. First, multi- 
attribute evaluation is done assigning relative preferences to attributes. Other studies 
in multi- attribute evaluation [1,2] generally impose the use of a real concrete value 
that captures each attribute’s importance, which sometimes can be difficult to 
quantify. Second, learning is performed by using an on-line reinforcement learning 
algorithm during the negotiation process, through a qualitative feedback that is the 
opponent’s comment to each proposal. Third, the inter attributes’ dependencies 
resolution process proposed in Q-Negotiation reach the optimal solution keeping 
information private as much as possible. Other known approaches related with 
distributed dependencies resolution, either reach non-optimal solutions [8, 9], or 
impose the knowledge of other agents’ private information to be made public [10]. 
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For the VO operation stage we here propose the exploration of a phased 
commitment that is established in the end of the negotiation process during VO life 
cycle previous stage. This commitment is then specified in a contract. Through 
phased commitments, the Electronic Institution has the capability to monitor the 
behaviour of the participant entities at pre-specified moments previewed in the 
contract according to time points or future events. 

We intend to develop in the future knowledge representation and ontology [15] 
services appropriate for Electronic Institutions. 
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Abstract. The present paper concentrates on one modeling approach for 
homogenous societies of agents in a given environment. It is an extension of the 
existing Interactivist-Expectative Theory on Agency and Learning to multi- 
agent environments. The uniagent theory’s key phrases are expectancy and 
learning through interactions with the environment. Motivated by the research 
done in the domain of imitation in humans, this paper introduces learning by 
imitation through interaction between akin agents. The social consequences of 
such an environment from the perspective of learning and emergence of 
language are discussed as well. 



1 Introduction 

In our work on within the Interactivist-Expectative Theory of Agency and Learning 
(lETAL) [8], we proposed and investigated a learning paradigm in autonomous agents 
in a uniagent environment. The key concepts of this theory are expectancy and 
learning the environment through interactions, while building an intrinsic model of it. 
Depending on the set of active drives, the agent uses the appropriate projection of its 
intrinsic model in order to navigate within the environment on its quest to satisfy the 
set of active drives. 

In this paper we take off with the existing results of the theory and propose the 
basis of a multiagent theory, where apart from the notions of expectancy and learning 
through interactions with the environment, we introduce interaction, [2], between 
alike agents, inspired by research results on the phenomenon of imitation [1], [4], in 
both neurophysiology and psychology. 

The paper is organized as follows. Section 2 summarizes some of the relevant 
results in research of the phenomenon of imitation in humans. In Section 3, we give 
the introduction to the lETAL Theory. In Section 4, we formalize an instantiation of 
the lETAL theory in the agent that we call Petitage, which interacts with the 
environment it is in, and builds its intrinsic model of it. By equipping the agent with a 
special sensor for sensing the alike agents, and enabling the agents to exchange each 
other’s contingency tables, we allow the unit in an environment inhabited with 
Petitage-like agents to function as a homogenous multiagent world with socially 
active agents. This environment and the emergence of language is the focus of 
discussion in Section 5. The last section overviews the paper, and states directions for 
further investigation in the domain. 
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2 Imitation Revisited 

In this section we summarize relevant results from research from neurophysiology 
and psychology applicable in the domain of imitation, in our approach towards the 
modeling of the multiagent society. This approach is justifiable by the efforts towards 
a more efficient learning paradigm in agents with the embodied mind paradigm. 

The concept of learning by imitation (learning by watching, , teaching by showing) 
is certainly not a new one. Thorndike [10] defines it as “[imitation is] learning to do 
an act from seeing it done, whereas Piaget [5] mentioned as a major part when 
offering his theory for development of the ability to imitate going through six stages. 
After Piaget the interest in (movement) imitation diminished, partially because of the 
prejudice that “mimicking” or “imitating” is not an expression of higher intelligence, 
[7]. Imitation is, though, far from being a trivial task. It is rather a highly creative 
mapping between the actions and their consequences of the other and of one-self’s 
(visual input-motor commands-consequences evaluation). Rizzolatti et al, [6], 
discover the so called “mirror neurons” - neurons that fire while one is performing 
some motor actions or look at somebody else doing the same action. 

This finding of Rizzolatti may give insights in our empathizing abilities as well as 
our “other mind reading” abilities. It states that in the addition of the first person 
point of view (subjective experience) and the third person point of view (scientific, or 
objective stance) we have a special within-the-species shared point of view dedicated 
on understanding others of the same species. Consequences for cognitive science and 
AI include putting radical constraints within the space of possible models of the mind 

Within the classical, disembodied approach recognizing and communicating with 
others was not a real problem. New knowledge (i.e. new combinations of symbols) 
was easily communicated via inherently linguistic terms, due to the assumption that 
everybody and everything have access to some internal mirror representation, [3]. 



3 Basics of the UniAgent Theory 

In this section we summarize the uniagent lETAL theory in order to facilitate the 
communication of ideas in the remaining part of the paper. 

The notion of expectancy has a central role in this theory, and the agent, while 
being in the environment, anticipates the effects of its own actions in the world. In 
order to avoid any possible terminological confusion, we give the following 
explanation. An agent is said to be aware of the environment it inhabits if it can 
anticipate the results of his own actions. This means that, given some current percept 
p the agent can generate expectancies about the resulting percepts Pj, p^,... if it applies 
actions aj, a^... After inhabiting some environment for certain time, an agent builds a 
network of such expectancy triplets Pj-a^-p^. This brings us to the second key concept 
in our theory of agency, namely the concept of agent environment interaction. As we 
see, the main problems are: first, to learn the graph and second, to know how to use it. 
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4 Formalization of the Agent 

In this Section we give an algebraic formalization of the agent that inhabits the 
multiagent environment, [11]. Formalizing the agent and its interaction(s) is crucial 
for the experimental setups in the exploration of the efficiency of the approach. The 
agent is gaining its knowledge throughout its stay in the environment. 

If we speak in terms of graphs, we may depict this situation with a graph whose 
nodes are labeled and these labels may be same for different nodes. The only way a 
node can be recognized as different then is to examine its context. Context of a node 
is a tree-like data structure defined recursively as all followers of the node together 
with their contexts. In [11], we have presented learning algorithms inspired by 
biological systems. 

Let V=(v , V , ..., v }be a finite nonempty set of vertices and A=|s ,s , ... , s }a 

I 2 n 12m 

finite nonempty set of actions with implicit ordering induced by the indexing. Let the 
pair Gs = (V, r), seA, where r^VxV,be oriented regular graph with matrix of 

incidence of the relation r for every se A that contains exactly one element 1 in each 
column. 

The graph G' = (V, U ) will be called graph of general connectivity of the 

jeV 

family of graphs (G : se A}. If the graph G' is connected, then the triplet G=(V, A, r), 
rc(VxV)xA, defined as follows ((v , v ),s)er if and only if (v ,v )er v ,v e V, se A, 

J 2 1 2 s I 2 

represents an oriented graph with marked edges. V is set of vertices, r relation, and A 
set of actions of G.. 

Let L=(l , 1 , ..., 1 jbe a set of labels (and f : V^L a surjection. F is called labeling 

I 2 k 

of the vertex set of G, and it endorses the possibility of modeling the perceptual 
aliasing. 

The graph that we, as designers of the experimental setup, see G"=(V, A, L, r, f) is 
is the Designer Visible Environment, and the graph G'" = (L, r'". A) is the Agent 
Visible Environment (model). This is what the agent is able to “see”. 

During its stay in the environment, the agent is interacting with the environment, 
and builds its intrinsic representation of the environment (Fig 1). 

When the, say, hunger drive is activated for the first time, the agent performs 
random walk during which expectancies are stored in the associative memory. The 
emotional contexts of these expectancies are neutral until food is sensed. Once this 
happens, current expectancy emotional context is set to positive value. This value is 
then exponentially decremented and propagated backwards following the recent 
percepts. 

Every next future time the hunger drive is activated, the agent uses the context 
values of the expectancies to direct its actions. It chooses, of course, the action that 
will lead to expectancy with maximum context value. If all the expectancies for the 
current perceptual state are neutral, random walk is performed. Again, when food is 
sensed emotional contexts are adjusted in the previously described manner. 



Generate_Intrinsic_Representation (G: Interaction_Graph, 
Schema; GIR: Assotiative_Memory) 
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BEGINPROCEDURE 

Initialize (Ra=0) ; Initialize_Position (G; Position) ; 
Try (Position, (B^, Sj ) ; 

Add ( [ a, X) , ] ; Ra) ; 

WHILE (Active_Drive_Not_Satisf ied) DO 
Try (Position, (B^, Sj ) ; 

Add ( [ (B^, S^) , (B^, SJ ] ; Ra) ; 

(B^, S^): = (B^, S^) ; 

END_WHILE 

Propagate_Context ( (B^, S^) , drive; GIR) . 

ENDPROCEDURE 

Try (Position: Location_In_Interaction_Graph, Schema; 

(B, S) ; Percepts_Actions_Pair) 

BEGINPROCEDURE 

S : =X; 

Tryin (Position, (Add (S, Current_Percept ) , B) ) ; 

REPEAT 

Tryin (Position, B; (Add (S, Current_Percept ) , B) 
UNTIL NOT enabled (B) 

ENDPROCEDURE 

Propagate_Context (d: drive; GIR: Assotiative_Memory) 
BEGIN_PROCEDURE 
N: =0 ; 

WHILE (B^, SJ e Projection^ (GIR) DO 
Proj ection^ (GIR) := exp(-N); 

INC (N) 

END_WHILE 

END PROCEDURE 



Fig. 1. A possible implementation of the learning procedure for the autonomous agent. 

The way the agent “perceives” the environment is different when different drives 
are active. That means that it uses different instantiations, depending on the given set 
of drives. It is realistic to assume that the drives are ordered in a complete sub-semi- 
lattice manner (meaning that every subset of drives has its maximum within the 
structure of drives). 

The top of the drive structure in all agents in the multiagent society is the drive to 
imitate the agents “in sight”. Once two agents sense each other, they shift to the 
imitation mode, in which they exchange their contingency tables. The discussions on 
the consequences of the inter-agent interaction are discussed in the following section. 



5 The Multiagent Society 

In this Section we give a description of a multiagent environment inhabited by 
Petitage-like agents, that are able to imitate their cohabitants. The problems of 
intraagent communication in heterogeneous multiagent environments are discussed at 
the end of the Section. 

As discussed before, the agent is equipped with a special sensor that is sensing 
other Petitage-like agents in the same environment. The sensor is working in parallel 
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with the other sensors. Sensing another Petitage takes the agent into its imitating 
mode, during which the agents exchange their contingency tables. The tables are 
being updated by new rows. 

Let us suppose that the contingency table of one of the agent has M rows, and of 
the other N rows. After the process of exchange of the contingency table, both of the 
agents update their tables, with the union of their individual tables. In mathematical 
terms, that would mean that the cardinality of the relations (number of rows of the 
contingency table) will be at most M+N. 

The agents might have been visiting different areas of the environment during their 
walk through the environment, or areas that are perceptually very similar to each 
other, which they have inhabited for durations of time T^, and respectively. 
Depending on the topology of those areas of the environment, and their similarity, the 
intersection of the relations can range between 0 and min {M, N], the first 
corresponding to a visitation to perceptually different environments, and the latter to 
visitation of perceptually very similar environments. 

The results form the uniagent lETAL cannot be generalized to state that the longer 
an agent has been in the environment, the “longer” the contingency table will be. 
Therefore, the “age difference” abs (T^-T^) has no influence on the shapes on the 
learning curves. On the other hand, the learning curve decreases in time, [8]. 
However, due to the solution of the problem of cycling in a simulated environment by 
introduction of random moves once cycling is detected, the odds of the agent to 
explore a more comprehensive part of the whole environment are increased. 
Nevertheless, after the process of information interchange, the learning curve drops, 
and is in the band between the time axis and the lower of the individual learning 
curves of each of the agents. 

Upon the first look the society of Petitages would be easily ordered by the amount 
of information contained in the contingency tables. However, due to the history of the 
agent and the part of the environment visited since, the agents in the environment 
cannot be ordered in a linear manner by inclusion of the contingency tables. 

Algebraically, the structure with a carrier all possible contingency tables in a given 
environment, ordered by set inclusion, from the designer’s point of view, is a lattice 
that we refer to as the lattice P of all Petitages. The bottom element an agent with no 
knowledge of the environment, which has been made inhabitant of the environment, 
and top an agent that has been in the environment long enough to know everything. 
The agents will clusterize in sublattices of agents that have been visiting similar areas 
of the environment. As agents come into the environment and start learning, they 
walk through P, too, moving in the direction of the top element. All the agents in the 
environment at a given time t, consist a structure S(t), that is a general partially 
ordered set by inclusion, with both comparable and incomparable elements. In other 
terms, as a graph it may not be connected at a given point in time. 

Every element of the substructure of the hierarchy rooted at a given agent A, 
knows what A knows at 1 east what A knows. Its contingency table is subset of the 
tables of the agents comparable and higher to A in P. The contingency table of the 
root of the class of agents rooted at A at in P will be referred as to an entry in their 
lexicon. After an imitation instance, their lexicon will naturally change by expanding 
the lexicon. 
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Fig. 2. The lattice of hierarchy of all possible agents (left). After the imitation, both agents 
involved in the contingency table interchange, are propagated upwards in the lattice P. CT 
stands for contingency table. A, and A^are agents. 



The lexicon should be understood as a system of activities that all the agents in the 
class can perform. The language of the AA society will consist of the union of 
lexicographic entries of all agents currently in the environment. 

No general interrelation can be stated between the knowledge and the age 
hierarchies of the agents in the world of Petitages. As stated before, no guarantee can 
be given that the longer an agent inhabits the environment, the more it “knows”. The 
same statement holds even for agents with the same lexicons, i.e. belonging to a class 
rooted at the same agent in P or S(t). 

The problem of inability to exchange the whole contingency tables due to hardware 
or time constraints is a practical constraint that needs to be taken into serious 
considerations while simulating a multiagent environment inhabited by Petitage-like 
agents. 

The problem off lack of time for convention (the other agent got out of sight before 
the contingency tables were exchanged) is not as severe as the problem of hardware 
memory constraints. The imitation stage pulls both agents higher in P; they actually 
gain more knowledge of the environment. While addressing the memory problem, 
policies of forgetting need to be devised. 

6 Conclusions and Directions for Further Work 



In this paper we proposed a model for a multiagent society, based on expectancy 
and interaction. Based on results on imitations, we proposed a multiagent version of 
the lETAL theory. The Petitage-like agents inhabit the same environment and interact 
with each other using imitation. The drive to imitate is the highest on the hierarchy of 
drives of all agents. After the convention of two agents, they interchange their 
contingency tables, and continue to explore the environment. The problems associated 
with the imitation stages and the changes in the agents have been observed from 
different angles. 

The abundance of approaches towards multiagent systems that are not necessarily 
compliant with the mainstream AI, give us the motivation to explore further the 
theory. In an upcoming stage of our research, we will simulate the World of Petitages, 
and explore the problems that such a simulation will reveal, for a later implementation 
of the approach in robotic agents. 
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Abstract. The situation calculus, originally conceived by John McCarthy, is one 
of the main representation languages in artificial intelligence. The original pa- 
pers introducing the situation calculus also highlight the connection between the 
fields of artificial intelligence and philosophical logic (especially modal logics of 
belief, knowledge, and tense). Modal logic changed enormously since the 60s. 
This paper sets out to revive the connection between situation calculus and modal 
logic. In particular, we will show that quantified hybrid logic, QHL, is able to 
express situation calculus formulas often more natural and concise than the orig- 
inal formulations. The main contribution of this paper is a new quantified hybrid 
logic with temporal operators and action modalities, tailor-made for expressing 
the fluents of situation calculus. 



1 Introduction 

The seminal paper that McCarthy and Hayes published in 1969, Some Philosophical 
Problems from the Standpoint of Artificial Intelligence, marks a watershed in artificial 
intelligence. It is the key reference for one of its main representation languages — the sit- 
uation calculus. We will focus here on the original version of situation calculus ([13,14]; 
sometimes called the “snapshots” version, to distinguish it from other variants). The 
most important construct of situation calculus is — no surprise — situations. As [13] has 
it: 



One of the basic entities in our theory is the situation. Intuitively, a situation is the 
complete state of affairs at some instant of time. . . . Since a situation is defined as 
a complete state of affairs, we can never describe a situation fully; and we therefore 
provide no notation for doing so in our theory. Instead, we state facts about situations in 
the language of an extended predicate calculus. Examples of such facts are 1. raining(s) 
meaning that it is raining in situation s. 

The situations are fully informed instances of the world of which we have limited knowl- 
edge, but still occur in the object language — this is what modal logicians now call a 
hybrid language. Precisely the same intuition is present in the writings of Arthur Prior, 
the founder of temporal logic [17]. McCarthy and Hayes [14] praise Prior’s work. They 
include his temporal operators into the situation calculus and they note the similarity 
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of their use of situation variables to Prior’s time-instants. But that is it. Apart from this 
promising beginning, the languages of situation calculus and the modal languages based 
on Kripke’s and Prior’s work have always stayed far removed from each other. 

We think this is at least partly due to historical reasons. First of all. Prior’s writing is 
notoriously difficult. Secondly, in the late 60’s hrst order modal logic was a hot topic but 
the debate centered around all its philosophical problems. At that time hardly anyone saw 
it as a useful language for doing knowledge representation, with McCarthy and Hayes 
as notable exceptions. In fact, Prior is an exception too; he saw that modal logic could be 
used for a general (dynamic) theory of information. Another important reason was the 
inadequate expressive power of the available modal languages for the purposes McCarthy 
and Hayes had in mind. Since the late 60’s, this situation has changed considerably. First 
and foremost, we know now that actions can be naturally represented in dynamic logic, 
a branch of modal logic.' Secondly, nowadays modal logic has become a respectable 
member in the field of knowledge representation, be it under the name of description 
logic. ^ Finally, the 90’s saw the emergence of a branch of modal logic called hybrid 
logic which took up, or sometimes reinvented, many of Prior’s ideas. E.g., seemingly 
unaware of Prior, Passy and Tinchev [15] argue for the introduction of names for states in 
dynamic logic. Hybrid logic adds to modal logic explicit reference to states, a mechanism 
to bind variables to states (the modal-logical term for situation), and a holds operator 
@i4>, allowing one to express that a formula (p holds at a state named i. 

The purpose of this paper is to introduce hybrid logic to the artihcial intelligence 
community. We will do this by showing that hybrid logic is very well suited to express 
what is normally formulated in the situation calculus. We have chosen for a comparison 
with the very first situation calculus language, from [14]. Our prime reason for choos- 
ing [14], apart from the fact that it started the held, is that one can feel their struggle 
with the hrst order language they are using. They have to introduce A-abstraction, and 
all the time they introduce abbreviations to make their formulas look intuitive. These 
abbreviations foreshadowed a number of later technical developments in modal logic 
(e.g., van Benthem’s celebrated standard translation into hrst order logic). In fact, we 
see McCarthy and Hayes as forerunners of the use of modal logic as a knowledge rep- 
resentation language and would not be surprized if they had used hybridized hrst order 
modal logic to state the situation calculus if only the right ingredients had been available 
when they wrote their article. 

The rest of this paper is structured as follows. We start with with a brief introduction 
to hybrid logic. In the main part of the paper we show how to express typical situation 
calculus statements in hybrid logic. Here we gently introduce the notions of hybrid logic 
and show their use in examples. Rigorous dehnitions of its syntax and semantics are 
provided in the appendix. We end with a discussion of the presented work. 



* Dynamic logic originates with V. Pratt [16]. The recent monograph [10] contains many applica- 
tions of dynamic logic to computer science. The rendering of a version of the situation calculus 
in GOLOG by Levesque, Pirri and Reiter [12] is also based on dynamic logic. 

^ Description logic [5,8] evolved out of Brachman and Schmolze’s knowledge representation lan- 
guage KL-ONE [6]. There are now a number of very fast DL provers for very expressive (exp- 
time complete) languages, e.g., DLP and Racer, cf., the DL webpage http://dl.kr. org/. 
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2 Hybrid Logic 

The rapidly growing field of hybrid logic, although rooted in the philosophical logic 
of Prior, is now being recognized as a tool in the field of knowledge representation. 
Hybrid logic has close connections with the field of description logic (cf., the page 
http : / /dl . kr . org/ or [8]). At present, several description logic theorem provers 
are being adjusted to handle the full nominals of hybrid logic. These provers handle 
propositional hybrid fragments with an exponential time worst case complexity with 
surprising efficiency. The proof and model theory of propositional hybrid logic is by 
now understood very well [3,2]. Recent unpublished work on first order hybrid logic 
indicates it has enormous advantages over first order modal logic. For instance, a com- 
plete analytic tableau system exists which also yields interpolants. One of the strong 
indications that something is missing in the usual formulation of first order modal logic 
is its failure of the interpolation property [9]. The computational and applied logic group 
at the University of Amsterdam is currently implementing a resolution-based theorem 
prover for hybrid logic. Carlos Areces maintains a web page devoted to hybrid logic at 
http : / /www . hylo . net. There have been a number of hybrid logic (HyLo) work- 
shops. The next will be held as a LICS-affiliated workshop during the summer of 2002. 



3 Situation Calculus as Hybrid Logic, First Steps 

In this section we argue that hybrid logic is an excellently suited formalism to speak 
about situations and fluents. We do this by reviewing the key examples in [14] and 
reformulate them in hybrid logic. The hybrid language will be introduced informally 
and step by step. A rigorous formal definition of the resulting quantified hybrid logic 
can be found in the Appendix. 

McCarthy and Hayes seem very much willing to suppress the situation argument in their 
formulas, just as in first order modal logic. This shows in all example formulas in section 
2 of [14]. They find it unnatural (and going against natural language practice) to add 
an extra argument to each predicate symbol for the situation. For example “John loves 
Mary” has to be expressed as love{j, m, s) where s refers to a situation. For this reason 
they introduce “abbreviations” in which this extra argument is suppressed. (We write this 
between quotes as the syntactical status of these formulas is not always clear.) Still they 
cannot do this in all cases because they sometimes need to refer to situations explicitly. 
They note the similarity with Prior’s nominals: 

The use of situation variables is analogous to the use of time-instants in the calculi of 
world-states which Prior [17] calls U-T calculi. [14, p.480] 

We will now show that the modern treatment of Prior’s ideas which has become known 
under the name of hybrid logic provides exactly the linguistic elements that McCarthy 
and Hayes seemed to be searching for. 

The two most important semantic constructs in the situation calculus are the situation 
and the fluent. A situation is the complete state of the universe at an instant of time. A 
fluent is a function whose domain is the set of situations. Propositional fluents are fluents 
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whose range is the set of truth values {true, false}. Situational fluents are those whose 
range is the set of situations itself. 

We start with considering propositional fluents. The key idea of situation calculus is 
that the meaning of every expression is a fluent. If we equate situations with the possible 
worlds from Kripke semantics, following the suggestion in [14, p.495], then sentences 
in quantihed modal logic express propositional fluents. For example, the meaning of the 
sentence “John walks” is traditionally given as the set of possible worlds in which the 
sentence “John walks” is true. This set of course uniquely determines a propositional 
fluent. 

Key idea of modal logic: Every first order modal logical sentence expresses a propo- 
sitional fluent. It does so without referring explicitly to situations. In fact in traditional 
modal logic one can not refer to the situations (more traditionally called “worlds”) in the 
models. Also in quantified hybrid logic (QHL) every sentence expresses a propositional 
fluent. But in addition one can refer to situations and indicate that a formula holds at a 
certain situation. 

Names for situations and a holds operator. But McCarthy and Hayes need more expres- 
sive power than quantified modal logic has to offer. They want to be able to express “At 
situation s, ‘John walks’ holds’’.^ This is not possible in quantified modal logic because 
it contains no machinery to refer to possible worlds. 

This is where Prior’s ideas and their modern treatment in the form of hybrid logic 
come into action. For the moment, add a second sort of variables, called nominals, to 
the language of first order logic. Every nominal is a formula, and nominals can be freely 
combined to form new formulas. In addition, whenever i is a nominal and </> is a formula, 
then also @j(/> (pronounce: at i, f) is a formula. 

The function of nominals is to name situations. The meaning of a nominal i — an 
atomic formula in hybrid logic — in a model will be the propositional fluent which is true 
only for the unique situation that is named by i in the model. @i<j> adds a holds-operator 
to first order logic: @i(j) states that the formula f holds at the situation named i. Thus 
the meaning of @i(j) is the constant propositional fluent which sends every situation to 
true if 4> holds at the situation named i, and every situation to false otherwise. 

Let’s consider the first example from [14, p.478]. McCarthy and Hayes want to 
“assert about a situation s that person p is in place x and that it is raining in place x.” 
This is expressed by at{p, x, s) A raining{x, s). Not being satisfied with this notation 
they give two other possible equivalent notations: 

[at{p,x) A raining (x)]{s) (1) 

[Xs' .at{p,x,s') A raining{x, s')]{s) . (2) 

In QHL all these are expressible by different formulas without lambda abstraction. 
The fluent Xs' .at{p, x, s') A raining{x, s') is simply expressed in QHL by at{p, x) A 
raining (x) . The formulas (1) and (2) are then expressed by @g(at(p, x) A raining(x)), 
an almost literal translation of the statement in natural language. Finally the original 

^ The holds operator plays an important role in a number of knowledge representation formalisms, 
for instance in Allen’s work on events and intervals [1] and in Kowalski’s event calculus [11]. 
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formulation is expressed by distributing over the conjunction as in @sat{p,x) A 
@sraining{x). 

Theories and definitions. There is a second reason why McCarthy and Hayes want 
explicit reference to situations. To express laws of nature, definitions or other information 
which is supposed to be true in all situations, you have to universally quantify over 
situations. They give the example of a kind of transitivity for the predicate in(x,y,s) 
which expresses that x is in the location in situation s: 

'^x'iy'iz'is.{in{x, y, s) A in{y, z, s) ^ in{x, z, s)) (3) 

yx\/yy zy.{in{x, y) A in(y, z) in{x, z)). (4) 

In the second statement the situation argument is suppressed and V. is meant to implicitly 
quantify over all situations. In modal terminology V. functions as a universal modality. 
In description logic a special status is given to statements which are supposed to be true 
in all situations. They are placed in, what is called, the T-Box (for Theory Box). This is 
the natural place to collect definitions and other laws which hold universally. We adopt 
this T-Box machinery and express (3) and (4) simply by putting the QHL sentence (5) 
in the T-Box. 

yxyyy z{in{x, y) A in{y, z) in(x, z)) (5) 

Note that this is almost literally the formulation (4) which is preferred in [14], except 
that the unappealing empty quantifier is replaced by the T-Box. 

Prior’s temporal operators. In section 2 of [14], Prior’s temporal operator F is introduced 
in the situation calculus. Here it becomes clear that the used formalism is not suited: only 
with explicit A-abstraction can one make a simple causality assertion. F(7 t, s) means that 
“the situation s will be followed (after an unspecified time) by a situation that satisfies the 
fluent 7t”. To describe the temporal aspect of situations, McCarthy and Hayes postulate 
a function time from the set of situations to a set of time-points. The last set comes with 
the usual (linear) earlier than ordering. 

Now (6) is the formalization of the assertion that “if a person is out in the rain, he 
will get wet”. 

yxypy s[raining{x, s) A at{p,x,s) A outside{p, s) f{Xs'.wet{p,s'),s)]. (6) 

This is also too much for McCarthy and Hayes and they quickly suppress explicit mention 
of situations, yielding 

yxypy .[raining (x) A at{p,x) A outside{p) ^{wet{p))]. (7) 

If we delete the empty quantifier V. in (7) and put the result in the T-Box, we get the 
formalization in temporal QHL. 

In temporal QHL, Prior’s temporal operators F and P are added to the language: 
whenever (/> is a formula, also Ffi and Pfi are formulas. Their meaning is evaluated 
locally in a situation: is true in a situation s if there exists a situation s' such that 

time{s) < time{s') and <j) is true at s' . The meaning of Pfi is defined similarly but with 
s' before s. Thus F^ is true in a situation s if there exists a situation in the future of s at 
which (f) is true. Pfi expresses the same thing, but with respect to the past. 
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Actions. The largest change in the language comes from our treatment of actions as 
compared to that in [14]. (A related approach is taken by Levesque, Pirri and Reiter 
[12], cf. also Reiter’s book [18]). We treat actions as in dynamic logic [10] and introduce 
a modality for every action. McCarthy and Hayes [14] deal with actions through the 
situational fluent result {p, a, s). In this, p is a person, a an action and s a situation. The 
value of result{p, a, s) is the situation that results when p carries out a, starting in s. If 
the action does not terminate result {p, a, s) is considered undefined. 

Note that result {p, a, s) is a function with the set of situations as its range. Using 
functions one can only handle deterministic actions. Another drawback of this repre- 
sentation is the use of partial functions. It is unclear what truth value a formula should 
receive when some of its arguments are undefined. Reiter [18] has similar problems 
which lead to the introduction of “ghost situations.” Dynamic logic offers a solution for 
these problems, but pays the price that explicit reference to situations is not possible in 
the language. As we will see, when this is needed it can be elegantly done in hybrid logic. 
To simplify matters, we just consider actions and let the actor be implicit. So assume 
there is a set ACT of primitive actions. Then whenever c/> is a formula and a € ACT is 
an action, also (a}c/> and [a](f> are formulas. {a)(j> is true in a situation s if there exists a 
situation s' which is the result of carrying out a in s and <j) is true in s' . \a]4) is defined 
dually, so that <j) needs to be true in all situations s' which result from carrying out a 
in s. Thus if a is a deterministic action expresses that <j) is true in the situation 

result{a, s). 

McCarthy and Hayes use result to express certain laws of ability of the form @s(f> 
©s'i’ with s' = result {o', s), expressing that if (p holds at s, then is true in the situation 
which is the result of carrying out a in s. With action modalities one can make more 
fine-grained distinctions. @g(j> ^ (q^)T expresses that a can be carried out in situation s 
if (j> holds there. @s(p [a] V’ expresses that if a is carried out in s under the assumption 
of (j), then f is true in every resulting situation (though there need not exist one). Here are 
two more examples of properties which cannot be expressed in situation calculus (or for 
that matter, in dynamic logic), but can in the hybrid formalism: @s(a)T expresses that 
it is possible to carry out action a successfully in situation s; @s[a] Ps expresses that the 
situation which results after carrying out action a in situation s is later in time than s. 
In plain words this formula expresses that it takes time to perform a. The combination 
of actions into strategies is immediate in this approach. Whenever tp is a formula and 
Qfi, . . . , € ACT are actions, also (ai) • • • {an)(p and [ai] • • • [«„](/) are formulas. 

4 Discussion and Conclusions 

The seminal paper that McCarthy and Hayes [14] published in 1969 marks a watershed 
in artificial intelligence. Its importance can simply not be underestimated: apart from 
introducing the situation calculus as one of the main representation languages in artificial 
intelligence, the paper is most famous for singling out a number of fundamental problems 
that did set artificial intelligence’s research agenda for years fo come. Amongsf ifs most 
important contributions are its role in the identification of the monotonicity of classical 
logic as a fundamental problem for intelligent robots; and perhaps it is most famous for 
introducing the frame problem (an area of unsurpassed activity in artificial intelligence). 
Both these fundamental problems resulted in important research traditions (see [7] for an 
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overview of the field of non-monotonic reasoning, and see [19] for a survey of the frame 
problem). Nowadays, the ideas of [14] seem to have reached their ultimate success — 
they are part of the common knowledge and taken for granted by most researchers. 
Nevertheless, we feel that there are more than historical reasons for re-appraising [14]. 

A less frequently discussed contribution of the original paper is that it highlighted 
the connection between the fields of AI and philosophical logic (especially modal logics 
of belief, knowledge, and tense). This is even more extraordinary considering that the 
formulation in terms of Kripke semantics of these modal logics were recent develop- 
ments in the 60s, and at that time part of a rather peripheral area in logic, plagued by deep 
philosophical problems. However, also modal logic progressed since the 60s and broad- 
ened its subject matter. As an illustration, the recent monograph [4] starts with stating 
that “modal languages are simple yet expressive languages for talking about relational 
structures”. It is this view, of modal logic as a multi-purpose knowledge representation 
language, which holds the promise to shed new light on some of the fundamental prob- 
lems of knowledge representation. Arthur Prior held this view already, now it is being 
fully developed in the fields of description logic [8] and hybrid logic [3]. 

The main contribution of this paper is a new quantified hybrid logic with temporal 
operators and action modalities, tailor-made for expressing the fluents of situation calcu- 
lus. We have shown that in this quantified hybrid logic, situation calculus formulas can be 
expressed more natural and concise than the original formulations. Moreover, it comes 
with additional operators such as a downarrow binder that may enhance its expressive 
power beyond the original situation calculus. More generally speaking, the aim of this 
paper was to revive the connection between situation calculus and modal logic. This aim 
can perhaps best be viewed as an effort to bring back together two research traditions 
that have worked independently for many years. This may also help to highlight some of 
the common interests of knowledge representation and modal logic. We can only hope 
that this inspires further collaboration, and fruitful exchange of ideas between the two 
communities. 
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Appendix: Formal Definition of Quantified Hybrid Logic 

The language of quantified hybrid logic QHL has a set NO M of nominals, a set ACT of 
action statements, a set FVAR of first order variables, a set CON of first order constants, 
and predicates of any (including nullary) arity. The terms of the language are the con- 
stants from CON plus the first order variables from FVAR. The atomic formulas are all 
symbols in NOM together with the usual first order atomic formulas generated from the 
predicate symbols and equality using the terms. Complex formulas are generated from 
these according to the rules 

@n4> I “■</> I fAf I f I (j) ^ f I 3x(j) I yxf I F(/> I P0 I {a)4> \ [a](j) 

Here n C NOM, x C FVAR, and a C ACT. These formulas are interpreted in situa- 
tion calculus models. Such a model is a structure {S, time, T, <, {i?Q}agACT: Inom, D, 
Icon,Is)s(^S such that 
o S' is a set of situations 

o time is a function from S to the set of time points T 
o (T, <) is a linearly ordered flow of time 

o {^ajaeACT IS a Set of binary relations on S, one for each action a G ACT. 
o ^raom is a function assigning members of S to nominals; 
o /con is a function assigning elements of ZJ to constants in CON; 
o for each s G S, {D, Is) is an ordinary first order model 
The satisfation relation, when is a formula f true in situation s in model 9Jl under the 
(variable) assignment g, simply follows the recursive construction of the language. For 
a full description of QHL with temporal operators and action modalities, the reader is 
refered to the long version of the paper on URL 
(http : //www. illc -uva . nl/~kamps/sitcalc/). 
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Abstract. Following the introduction of Dynamic Logic Programming 
in [1], the language of updates LUPS was introduced in [2]. Whereas 
Dynamic Logic Programming provides a meaning to sequences of logic 
programs, each of them representing a state of the world, LUPS allows 
the specification of such states and state transitions. 

In this paper, we take a closer look at the language LUPS and identify 
one problem with its semantics and a possible, important, extension to 
its set of commands. We then propose an extension to the syntax of 
LUPS as well as a new semantics that solves the identified problem. We 
illustrate the changes by means of two examples. 



1 Introduction and Motivation 

In the past few years, research in the area of Nonmonotonic Reasoning has de- 
voted some attention to the problem of dynamically adapting a knowledge base 
(KB) to correctly represent a world that changes, i.e. how to update a KB. For 
the case where the KB is a theory in classical propositional logic, adequate solu- 
tions have been proposed in [8] and [15]. In [12,13], these solutions were adapted 
to allow for the update of logic programs and deductive databases, following the 
so called interpretation update approach. This approach has been shown inade- 
quate when applied to nonmonotonic theories, often leading to counterintuitive 
results as pointed out in [9]. Since then, several approaches for updating KBs 
represented by logic programs have been proposed [1,3,4,9,10,14,16]. Each of 
these approaches proposes a semantics for what the outcome of the update of 
a logic program by another such program ought to be or, more generally, what 
the meaning of a sequence of logic programs should be. For considerations and 
comparisons wrt. these approaches see [4]. 

In [2], the authors argue that besides assigning a meaning to a sequence of 
logic programs, one also needs a language to specify how such sequence of pro- 
grams is to be constructed i.e., besides declaratively specifying the states of a 
KB, it is advantageous to also declaratively specify the state transitions. And 
these state transitions should be allowed to depend on the states themselves. To 
this purpose, the language of updates LUPS [2] was introduced. In the LUPS 
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framework, the next state of the KB is produced according to a set of commands 
and the KB at the current state. Such LUPS commands allow the specification 
of statements such as: “a certain rule should belong to the next state if some 
condition holds in the current state'", which would be represented by the com- 
mand “assert Rule when Condition" . Similar commands exist for the retrac- 
tion of rules, for the specification of permanent assert commands, i.e. an assert 
command that is to be executed at every state after being issued, and for the 
cancellation of such persistent commands. For further motivation for the need 
of a language as L UPS, and some application examples, see [ 2 ] . Throughout the 
remainder of this paper we assume that the reader is familiar with L UPS and its 
Dynamic Logic Programming {DLP) based semantics. In Appendix we provide 
an overview of DLP and LUPS with all the relevant intuitions and definitions. 



1.1 Motivation 

In LUPS, the authors introduced a class of commands whose immediate effect 
should only hold in the successor state and should not persist by inertia in sub- 
sequent states. This kind of non-inertial commands are indicated by the keyword 
event (e.g. “assert event Rule when Condition"). 

According to the intuitive reading above, if we want to update an initial KB, 
Pi, with two consecutive updates U2 and U3, such that U2 only contains such 
non-inertial commands, and U3 is empty, it seems reasonable to expect that after 
the second update, C/3, what holds true is exactly equal to what held true before 
the first update, i.e. what holds true at Pi. Unfortunately this is not the case in 
LUPS, as illustrated by the following example: 

Example 1. Consider the simple case where Pi = {a ^}, possibly obtained by 
a past update command such as assert a and the following sequence of 
updates: 



U2 = {assert event a 

U3 = {} 

At state 3, i.e. after the update C/3, according to the semantics of LUPS we have 
M3 = {} as the only stable model, i.e. a is not a consequence of the knowledge 
base at state 3. 

In this example, M3 = {} is the only stable model because the command 
assert event a ^ asserts the rule a ^ at state 2, but then causes the removal 
of all rules (past and present) of the form a i.e. both the rule specified by 
U2 and the rule of Pi are removed. We argue that M3 = {a} should be the only 
stable model at state 3, because the command assert event a ^ should not 
affect (remove at state 3) the rule a ^ that was previously asserted at state 1 , 
i.e. the rule in Pi. Let us look at another example: 

Example 2. Consider a slight modification in the previous example, such that 
the initial program is, now, Pf = {a <— not T} (where, as usual, T is a reserved 
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proposition with the property of being false in every stable model i.e, not _L 
belongs to every stable model). If the same update sequence is performed, after 
the update C /3 we have = {a} as the only stable model. 

Updates are somewhat syntactical in nature (cf. [1]), but in this example, 
this syntactical difference in behaviour should not exist. We argue that both 
examples should have the same outcome, i.e. they should both have M = {a} as 
the only stable model at state 3. To avoid such problem, we argue that an event 
command should be exerted on the single asserted (or retracted) instance of a 
rule and not on all other syntactically equal ones. 

The main contribution of this paper is a proposal for a change in the seman- 
tics of LUPS to correct this undesirable behaviour. 

The second contribution of this paper resides in the extension of the LUPS 
syntax with the introduction of a persistent retract command. In LUPS, besides 
the assertion and retraction commands, denoted by the keywords assert and 
retract respectively, that may only contribute to the state transition for which 
they were specified, there is a command, denoted by the keyword always, which 
can be seen as a permanent assert command, i.e. until it is cancelled, it will 
cause the assertion of a specific rule every time the specified condition holds. We 
argue that such persistent command should also exist for the retraction of rules. 
To illustrate our claim, let us consider the following example from [2]: 

Example 3. Consider the following scenario: -once Republicans take over both 
Congress and the Presidency they establish a law stating that abortions are 
punishable by jail; -once Democrats take over both Congress and the Presidency 
they abolish such a law; -in the meantime, there are no changes in the law 
because always either the President or the Congress vetoes such changes. The 
specification in LUPS, as presented in [2], comprises the following persistent 
update commands^: 

always jail{X) <— abortion(X) when repC,repP (1) 

always not jail{X) ^ abortion(X) when notrepC,notrepP (2) 

The authors further state that in this example, alternatively, instead of the 
second command they could have used a retract command: 

retract jail{X) ^ abortion(X) when not repC , not repP (3) 

noting that, in this example, since there is no other rule implying jail, retracting 
the rule is safely equivalent to retracting its conclusion. 

We argue that neither of the proposed solutions, to represent the abolition 
of the abortion law when the Democrats take over (commands (2) and (3)), 
properly represents the intuition stated in the scenario. 

^ Where the rules with variables simply stand, as usual, for all the ground rules that 
result from replacing the variables by all the ground terms in the language. 
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The argument against the use of command (2) is implicitly stated by the 
authors of [2] when they say that “since there is no other rule implying jail, 
retracting the rule is safely equivalent to retracting its conclusion”. In fact, if 
there were other rules implying jail, which is fair to expect in a realistic scenario, 
such as for example a rule stating that assassinations are punishable by jail, then 
some undesirable effects would occur: -suppose such rule was represented by an 
update command, at the initial state, of the form: 

assert jail{X) ^ assassination(X) (4) 

If this command had been previously issued, then, after the Democrats take 
over both Congress and the Presidency, someone (mary) that both assassins 
someone and performs an abortion would not be punished by jail. This is so 
because the rule asserted by command (2) would reject, according to the DLP 
semantics, the previously asserted rule specified by command (4). The resulting 
knowledge base would have not jail{mary) as a consequence. 

The argument against the use of command (3) resides in the fact that, to 
effectively represent the intended meaning, this command would have to belong 
to every update i.e. it would have to be explicitly added to the specification 
of every state transition. Not only this does not add to the simplicity of the 
language but, also, the fact that the representation of the effect of Republicans 
taking over requires a single update command, and that the representation of the 
effect of Democrats taking over requires several update commands, one at each 
update, is somehow unintuitive. The introduction of the always command in [2] 
was justified with the need to avoid such consecutive repetitions of the assert 
command. We believe that such persistent command should also exist for the 
retraction of rules. A command that, until cancelled, retracts a specific rule from 
the KB every time the specified condition holds. In this paper we introduce such 
command. 

Throughout, to differentiate the original LUPS language and the extended 
and modified version here proposed, we refer to the latter as LUPS*. 

The remainder is structured as follows: First, in Sect. 2, we introduce the 
syntax of the extended language LUPS* . In Sect. 3 we propose a semantics for 
this extended language that corrects the above mentioned problem. In Sect .4 we 
illustrate with two examples. In Sect. 5 we compare both semantics by means of 
a desirable property. We then conclude in Sect. 6. 



2 L UPS* - Syntax 

In this Section we formalize the language of LUPS* . We will keep all the com- 
mands of LUPS, with a slight modification in the syntax of two of such com- 
mands, which will be explained below, and introduce the above mentioned per- 
sistent retract command and its corresponding cancellation command. 

The language of LUPS* will contain the basic non-persistent commands 
for the assertion and retraction of rules, denoted by the keywords assert and 
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retract. They are of the form: 

assert Li,...,Lk when Lfe+i, ...,Lm (5) 

retract L ^ Li, . . . , Lk when Tfe+i, . . . , Lm (6) 

Both these commands can be made persistent, i.e. until cancelled they are 
added to all subsequent updates. We will identify such persistent commands by 
prefixing the non-persistent commands with the keyword always. They are thus 
of the form: 



always assert L ^ Li, . . . , Lk when L^+i, ■ ■ ■ , Lm (7) 

always retract L ^ Li, . ■ ■ , Lk when L^+i, ■ ■ ■ , Lm (8) 

Note here the first modification in the syntax of a LUPS command. In LUPS, 
command (7) does not have the keyword assert, being identified by the keyword 
always alone. Since we are now introducing the persistent retraction command, 
this new notation becomes more symmetrical and therefore more intuitive. 

All the previous commands are inertial. To obtain the corresponding non- 
inertial commands one simply adds the keyword event to obtain the following 
commands: 



assert event L 


- Li, . 


,,Lk when Lk+i,. 


• • 5 Ljyi 


( 9 ) 


retract event L 


- Li, . 


.,Lfc when Lk+i,. 


■ ■ 1 


( 10 ) 


always assert event L 


- Li, . 


.,Lk when Lk+i, ■ 


• • 5 -^m 


( 11 ) 


always retract event L e 


- Li, . 


.,Lk when Lk+i, ■ 


• • 5 -^m 


( 12 ) 



Just as in LUPS there is a cancellation command for the persistent command 
always, denoted by the keyword cancel, so there will be one in LUPS* but 
now denoted by the keyword cancel assert, to simplify the introduction of 
a cancellation command for the persistent retraction command, which will be 
denoted by the keyword cancel retract. These two commands are: 

cancel assert L ^ Li, . ■ ■ , Lk when Lk+i , . . . , Lm (13) 

cancel retract L ^ Li, . ■ ■ , Lk when Lfc+i, . . . , Lm (14) 

The syntax of the update commands of LUPS* is, formally: 

Definition 1 (LUPS* - Update Commands). A LUPS* update command, 
or command for short, is a propositional expression of any of the forms'^ 



[always ] assert [ event] R when (f (15) 

[always ] retract [ event] R when <j) (16) 

cancel assert R when <j) (17) 

cancel retract R when (f> (18) 



^ Where [a] will be used for notational convenience denoting either the presence or 
absence of a. 
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where R is a rule of the form L ^ and <f is a (possibly empty) 

conjunction of literals, L^+i, ■ ■ ■ , Lm, where L and each Li are literals. Iff) is 
empty, we simply omit the when keyword. We refer to those commands without 
(resp. withj the keyword event as inertial (resp. non-inertialj commands. We 
refer to those commands with (resp. without j the keyword always as persistent 
(resp. non-persistent J commands. 

The following table summarizes the correspondence between the syntax of 
LUPS* and that of LUPS: 



LUPS* 




LUPS 


assert [ event] 




assert [ event] 


retract [ event] 




retract [ event] 


always assert [ event] 




always [ event] 


always retract [ event] 




non existing 


cancel assert 




cancel 


cancel retract 




non existing 



An update program in LUPS* is defined as follows: 

Definition 2 {LUPS* - Update Program). An update program in LUPS* 
is a finite sequence of updates, where an update is a set of commands of the form 
(15) to (18). 

3 LUPS* - Semantics 

The semantics of LUPS* is defined by incrementally translating update programs 
into sequences of generalized logic programs and by considering the semantics 
of the DLP formed by them. 

Let U = C/i 0 ... (8> [/„ be a LUPS* update program. At every state t we 
determine the corresponding DLP, Tt{U) =Vt. 

The translation of a LUPS* program into a dynamic program is similar to 
the one presented for LUPS. It is made by induction, starting from the empty 
program Pq and, for each update Ut, given the already built dynamic program 
Pq ® • • • ® Pt-i, determining the resulting program Pq ® • • • ® Pt-i ® Pi- To cope 
with persistent update commands, as for the LUPS semantics, we also consider 
a set containing all currently active persistent commands, although its defini- 
tion must be extended to deal with the newly introduced persistent retraction 
commands. As in LUPS, the retraction of rules imposes its unique identifica- 
tion. Therefore, the language of the resulting dynamic logic program must be 
extended with a new propositional variable “fV(P)” for every rule R appearing 
in the original LUPS program. To properly handle non-inertial commands, we 
also need to uniquely associate those rules appearing in non-inertial commands 
with the states they belong to. To this end, the language of the resulting dy- 
namic logic program must also be extended with a new propositional variable 
“Ev{R, Sy^ for every rule R appearing in a non-inertial command in the original 
LUPS program, and every state S. 
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We now present the translation for LUPS*: 

Definition 3 (Translation into dynamic logic programs). Let U = C/i 0 

• • ■ ® Un he a LUPS* update program. The corresponding dynamic logic pro- 
gram T*{U) = V = Vn = Pq ® ® Pn is obtained by the following inductive 

construction, using at each step t an auxiliary set of persistent commands PCt-' 

Base step: Pq = {} with PCq = {}• 

Inductive step: Let Tf_i{U) = Pt_i = Pq © • • • © Pt-i with the set of 
persistent commands PCt-i be the translation of Ut-i = Pi © • • • © Ut-i- The 
translation of = Pi © • • • © Pj is T)* = Pt = Po © • • • © Pt-i © Pt with the 
set of persistent commands PCt, where: 

PCt = PCt-i U {assert R when (p : always assert R when (f> e Ut} U 
U {retract R when (p : always retract R when p G Pt} U 
U {assert event R when p : always assert event R when p G Pt} U 
U {retract event R when p : always retract event R when Pt}U 

— {assert [event] R when p : cancel assert R when p G PtA 

A ® Pt-i ^ P} — 

— {assert [event] R when p : always retract [event] R when p G PtA 

A ® Pt-i ^ P} — 

— {retract [event] R when p : cancel retract R when p G PtA 

A ® Pt-i ^ P} — 

— {retract [event] R when p : always assert [event] R when p G UtA 

A0Pt-i 1= P} 



NUt = Pt U PCt 



Pt = {not N{R) retract R when p G NUt A 0Pt-i !=</'}© 

U{iV(P) ^;H{R) ^ B{R),N{R) : assert R when P G NUth®Vt-i \= P}U 
U {P(P) ^ B{R), Ev{R, t) : assert event R when p G NUt A 0 Pt-i 1= ^i^’} U 
U {not N{R) ^ Ev{R, t) : retract event R when p G NUt A 0 Pt-i 1= ^^} © 

U {not Ev{R, t — 1) Ev{R, t) ^} 

Note that the body of every rule R must be extended either with the predicate 
N{R) or with the predicate Ev{R,f). The differences between this transforma- 
tion and the original LUPS transformation are the following: - the modification 
in the definition of the set of active persistent commands, PCt, to deal with the 
newly introduced persistent retraction command and the corresponding cancel- 
lation command; - the modification in the definition of the generalized logic 
program at state t, Pt, to properly deal with non-inertial commands. This is 
achieved by treating inertial and non-inertial rules in a separate manner: the 
former are dealt with as in LUPS while the latter are extended with the pred- 
icate Ev{R,t) which is only made true for the duration of one state. This is 
achieved simply by including the rules not Ev{R,t — 1) <— and Ev{R,t) *— in 
the generalized logic program of every state. 

The semantics of LUPS* is, as expected: 
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Definition 4 {LUPS* Semantics). Let U = U\ ® ... ® Un he a LUPS* pro- 
gram. A query holds Li, . . . ,Lk at t? is true in U ijff ® T)* (U) 1= Li, . . . , Lk, 
or, equivalently, iff ®Vt 1= Li, . . . , Lk. By SM{U) we mean the set of all sta- 
ble models ofT*{U) at state n, modulo the newly introduced literals N{R) and 
Ev{R, S). 

4 Illustrative Examples 

Follows an example to illustrate the different behaviour between LUPS and 
LUPS*: 

Example f. Consider a public building with several ffoors. Initially, the security 
policy of the building states that any person (P) is allowed into any of its ffoors 
(E) only if they have a special permit for that floor. This can be represented by 
the update Ui: 

U\ : assert {allowed{P, F) ^ permission{P, F)) 

assert {not allowed{P, F) ^ notpermission{P, F)) 

Later on, the administration decided to open up a public relations office, to be 
situated in the ground floor. They had to update the security policy which, from 
then on, would allow any person in the ground floor provided they have some 
form of identification. Let us suppose this happened in the second day (state), 
and is represented by the update U 2 : 

U 2 : assert {allowed{P, ground) ^ id{P)) 

Further down the line (at the third day), the administration decided to, once 
every now and then, declare an open day when everyone, for the duration of one 
day, was allowed to visit the entire building, provided they have some form of 
Id. This is represented by the update C/3: 

C/3 : always assert event {allowed{P, F) ^ id{P)) when open-day 

Suppose that both Mary and John have Ids and Mary has a permission for the 
second floor, represented by the facts permission{mary, second), id{john) and 
id{mary), in the KB. At day five there is an open day represented by the update 
at state 4: 

C/4 : assert event {open-day ^) 

According to the LUPS* semantics, at state 4 Mary is allowed in the ground 
and second floors, and John is only allowed in the ground floor; at state 5 both 
Mary and John are allowed in all ffoors of the building; at state 6 and thereafter 
everything is back as it was at state 4, with Mary being allowed in the ground 
and second floors, and John only being allowed in the ground floor, as expected. 
If this problem was to be tackled with the semantics of L UPS, everything would 
be the same until (and including) state 5 but, at state 6, John and Mary (and 
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anybody else) would no longer be allowed in the ground floor, which seems 
to be rather unintuitive. This is so because not only the rule asserted by the 
always assert event command in U3 is removed, but so is the rule asserted by 
C/2, simply because it has the same syntax. □ 

We now show how Example 3 would be encoded in LUPS*'. 

Example 5. Consider the scenario of Example 3 . The specification in LUPS* 
would comprise the following persistent update commands: 

always assert jail{X) ^ abortion(X) when repC,repP 
always retract jail{X) <— abortion(X) when not repC,not repP 

If, as before, at the initial state we also have the following update command: 

assert jail{X) ^ assassination(X) 

Then, after the Democrats take over both Congress and the Presidency, if Mary 
both assassins someone and performs an abortion, she will now be punished by 
jail, as intended. □ 

5 Comparison 

In this section we compare LUPS and LUPS* by means of a property that 
partially characterizes the desired behaviour of the semantics wrt. non-inertial 
commands. 

Before we express this property, we start with the definition of update equiv- 
alence for update programs, according to which two update programs are update 
equivalent iff their semantics coincides after an arbitrary number of updates with 
arbitrary update programs. Formally: 

Definition 5 (Update Equivalence). LetUi andU2 he two update programs. 

0 

We say that IA\ and IA2 are update equivalent, denoted by IA\ = U2, iff for every 
update program Q,Ui® Q is semantically equivalent to U2 ® Q, i.e.: 

SM {Ui ®Q) = SM {U2 ® Q) 

where ifTZ = Rr^ ® ... ® Rr„ and S = Sg^ ® ... ® Sg^ are two update programs, 
by TZ® S we mean the update program Rr^ ® ...® Rr„ ® Sg., ® ...® Sg^ . 

With this property, we can now come back to our main claim. In particular, 
we claim that if we have a sequence of updates such that only non-inertial com- 
mands are executed, the knowledge state before the execution of such sequence 
of commands and the knowledge state after the execution of all such commands 
should be update equivalent. This means that the long term effect (more than 
one state) of non-inertial commands should only reside in their interaction with 
inertial commands, be them persistent or not, i.e., rules asserted or retracted 
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by non-inertial commands should only change the semantics at the next state 
to allow and/or prevent the execution of other commands at that state, which 
may be inertial and therefore change the semantics at subsequent states. In par- 
ticular, if there is a sequence of updates without any inertial command to be 
executed, followed by a state transition specified by an empty update, then, after 
its execution the knowledge state should be equivalent to the initial one. In other 
words, we want the effect of non-inertial commands to be, in fact, non inertial. 
The referred empty update is necessary because the immediate effect of every 
non-inertial command holds for one state, i.e. we need an extra state transition 
for such effect to be removed. 

This is formally defined as follows: 

Definition 6 (Stability wrt non-inertial commands). An update language 
is stable wrt. non-inertial commands iff for every update programs U\ and IA 2 
such that U\ consists of updates with non-persistent commands only, and IA 2 
consists of updates with non-persistent, non-inertial commands only. 



U\ ®U^i)= U\ ® IA2 ®U(h 

where Uij, denotes an empty update. 

The restriction imposed on the updates of lAi to only contain non-persistent 
commands, i.e. without the keyword always, is justified by the fact that if such 
persistent commands were present, they could be executed at subsequent states, 
together with the updates of IA 2 , thus invalidating our goal to have only non- 
inertial commands being executed at the state transitions corresponding to the 
updates of U 2 - Recall that an inertial command is valid at all subsequent states, 
until cancelled. Note that to express our intuition it suffices to guarantee that 
at the state transitions corresponding to the updates of IA 2 only non-inertial 
commands are executed. In fact, this would also be achieved by allowing IA\ to 
contain persistent non-inertial commands so as long as we ensure that at the 
state transition corresponding to C/g no commands are executed, but we will 
keep to this simpler formulation. 

Proposition 1. LUPS is not stable wrt. non-inertial commands. 

Example 1 above shows that LUPS is not stable wrt. non-inertial commands. 
It is worth noting that the recently proposed language of updates EPI [5] , being 
based on the semantics of LUPS, is also not stable wrt. non-inertial commands. 
The same example also applies to EPL. 

Theorem 1. LUPS* is stable wrt. non-inertial commands. 

6 Conclusion 

In this paper we have drawn the attention of the reader to an intuitively incorrect 
behaviour of the semantics of the language of updates L UPS, when dealing with 
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non-inertial commands. Hoping to have convinced the reader that such behaviour 
is in fact incorrect, we have then proposed a modification to the semantics of 
LUPS to correct such problem. Both semantics, the original and the modified, 
were compared by means of a desirable property that only holds in the latter. 

We have also suggested the need for a new persistent retraction command, 
symmetrical to the persistent assertion command, and extended the syntax and 
adapted the semantics accordingly. 

Each contribution was illustrated by means of an example. 
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A Background 



In this Appendix we provide some background on Generalized Logic Programs, 
Dynamic Logic Programming and LUPS. 



Generalized Logic Programs Here we recapitulate the syntax and stable 
semantics of generalized logic programs^ [1]. 

By a generalized logic program P in a language C we mean a finite or infinite 
set of propositional clauses of the form Lq ^ where each Li is a 

literal (i.e. an atom A or the default negation of an atom not A). If r is a clause 
(or rule), by H(r) we mean Lq, and by B{r) we mean L\, . . . ,Ln- If H(r) = A 
(resp. H{r) = not A) then notH{r) = not A (resp. notH{r) = A). By a (2- 
valued) interpretation M oi C we mean any set of literals from C that satisfies 
the condition that for any A, precisely one of the literals A or not A belongs 
to M. Given an interpretation M we define M+ = {A : A is an atom, A G M} 
and M~ = {not A : A is an atom, not A G M}. Wherever convenient we omit 
the default (negative) atoms when describing interpretations and models. Also, 
rules with variables stand for the set of their ground instances. We say that a 
(2-valued) interpretation M of £ is a stable model of a generalized logic program 
P if p{M) = least (p{P) U p{M~)), where p{.) univocally renames every default 
literal not A in a program or model into new atoms, say not-A. In the remaining, 
we refer to a GLP simply as a logic program (or LP) . 



Dynamic Logic Programming Next we recall the semantics of dynamic logic 
programming [1]. A dynamic logic program V = {Pg ■ s G S'} = PqQ ...(BPnQ ■■■, 
is a finite or infinite sequence of LPs, indexed by the finite or infinite set S = 
{1,2, ..., n, . . .}. Such sequence may be viewed as the outcome of updating 
Pq with Pi, ..., updating it with P„,... As we will see, in LUPS each Pi is 
determined by the state transition. The role of dynamic logic programming 
is to ensure that these newly added rules are in force, and that previous rules are 
still valid (by inertia) for as long as they do not conflict with more recent ones. 
The notion of dynamic logic program at state s, denoted by 0^ V = Pq® ■■■(BPs, 
characterizes the meaning of the dynamic logic program when queried at state 
s, by means of its stable models, defined as follows: 

Definition 7 (Stable Models of DLP). Let V ={Pg : s G S'} be a dynamic 
logic program, let s G S. An interpretation Mg is a stable model of V at state 

s iff 

Mg = least {[Vg — Reject{s, Mg)] U Default (fPg, Mg)) 

® The class of GLPs (i.e. logic programs that allow default negation in the premisses 
and heads of rules) can be viewed as a special case of yet broader classes of programs, 
introduced earlier in [7] and in [11], and, for the special case of normal programs, 
their semantics coincides with the stable models semantics [6]. 
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where 

= Ui<s 

Reject{s, Mg) = {r G Pi : 3r' S Pj,i < j < s,H{r) = notH{r') A Mg 1= B{r')} 
Default {Vg, Mg) = {not A \$r GVg : (H{r) = A) A Mg \= B{r){ 

If some literal or conjunction of literals 4> holds in all stable models ofV at state 
s, we write ®gV \= 4>. If s is the largest element of S we simply write \= 4>- 



LUPS Here we recall the language of updates LUPS closely following its original 
formulation in [2]. The object language of LUPS is that of generalized logic 
programs. A sentence U in LUPS is a set of simultaneous update commands (or 
actions) that, given a pre-existing sequence of logic programs Pq(B - ■ ■ (B Pn (he. 
a dynamic logic program), whose semantics corresponds to our knowledge at a 
given state, produces a sequence with one more program, Pq ® • • • ® Pn ® Pn+i, 
corresponding to the knowledge that results from the previous sequence after 
performing all the simultaneous commands. A program in LUPS is a sequence 
of such sentences. 

Given a program in LUPS, its semantics is defined by means of a dynamic 
logic program generated by the sequence of commands. 

In this update framework, knowledge evolves from one knowledge state to 
another as a result of update commands stated in the object language. Without 
loss of generality it is assumed that the initial knowledge state is empty. Given 
the current knowledge state, its successor knowledge state is produced as a re- 
sult of the occurrence of a set U of simultaneous updates. The knowledge state 
obtained by performing the sequence of updates U\,U 2 , ■ ■ ■ ,Un is denoted by 
Pi ® P 2 ® • • • ® Un- So defined sequences of updates will be called update pro- 
grams. In other words, an update program is a finite sequence U = {Ug : s G S} 
of updates indexed by the set S = {1, 2, . . . , n}. Each update is a set of update 
commands. Update commands (defined below) specify assertions or retractions 
to the current knowledge state. By the current knowledge state we mean the one 
resulting from the last update performed. 

Knowledge can be queried at any state t < n, where n is the index of the 
current knowledge state. A query will be denoted by: 

holds Li, . . . ,Lk at tl 

and is true iff the conjunction of its literals holds at the state obtained after the 

update. If t = n, the state reference “at t” is skipped. 

Update commands specify assertions or retractions to the current knowledge 
state. In LUPS a simple assertion is represented by the command: 

assert L ^ Li,...,Lk when Pfe+i, . . . , Pm (19) 

Its meaning is that if Pfc+i, . . . ,Pm is true in the current state, then the rule 
L ^ Li, . . . , Lk is added to its successor state, and persists by inertia, until 
possibly retracted or overridden by some future update command. 
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In order to represent rules and facts that do not persist by inertia, i.e. that 
are one-state events, LUPS includes the modified form of assertion: 

assert event L ^ L\, . . . , when Afc+i, . . . , Lm (20) 

The retraction of rules is performed with the two update commands: 

retract L ^ Li, . . . , Lk when Lk+i, ■ ■ ■ , Lm (21) 

retract event L ^ L\, . ■ ■ , Lk when Lk+i, . . . , Lm (22) 

Its meaning is that, subject to precondition Lk+i, ■ ■ ■ , Lm (verified at the current 
state) rule L ^ Li, . . . , Lk is either retracted from its successor state onwards, 
or just temporarily retracted in the successor state (if governed by event). 

Normally assertions represent newly incoming information. Although its ef- 
fects remain by inertia (until contravened or retracted), the assert command 
itself does not persist. However, some update commands may desirably persist 
in the successive consecutive updates. This is especially the case of laws which, 
subject to some preconditions, are always valid, or of rules describing the effects 
of an action. In the former case, the update command must be added to all sets 
of updates, to guarantee that the rule remains indeed valid. In the latter case, 
the specification of the effects must be added to all sets of updates, to guarantee 
that, when the action takes place, its effects are enforced. 

To specify such persistent update commands, LUPS introduces the following 
commands: 



always L 


- Pi, . 


. .,Pfc when Pfe+i,. 


• 5 


(23) 


always event L 


- Pi, . 


. .,Pfc when Pfe+i,. 




(24) 


cancel L ^ 


- Pi, . 


. .,Pfc when Pfe+i,. 


• 5 


(25) 



The first two statements mean that, in addition to any new set of arriving 
update commands, the persistent update command keep executing with them 
too. In the first case without, and in the second one with the event keyword. The 
third statement cancels execution of this persistent update, once the conditions 
for cancellation are met. 

Definition 8 (LUPS). An update program in LUPS is a finite sequence of 
updates, where an update is a set of commands of the form (19) to (25). 

The semantics of LUPS is defined by incrementally translating update pro- 
grams into sequences of generalized logic programs and by considering the se- 
mantics of the DLP formed by them. 

Let U = U\®...®Un be a LUPS programs. At every state t the corresponding 
DLP, Tt{U) =Vt, is determined. 

The translation of a LUPS program into a dynamic program is made by 
induction, starting from the empty program Pq, and for each update Ut, given the 
already built dynamic program Pq®’ • ‘(BPt-i, determining the resulting program 
Pq ® • • • ® Pi -1 ® Pt- To cope with persistent update commands, associated 
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with every dynamic program in the inductive construction, a set containing all 
currently active persistent commands is considered, i.e. all those commands that 
were not cancelled until that point in the construction, from the time they were 
introduced. To be able to retract rules, a unique identification of each such rule 
is needed. This is achieved by augmenting the language of the resulting dynamic 
program with a new propositional variable “iV(i?)” for every rule R appearing 
in the original LUPS program. 

Definition 9 (Translation into dynamic logic programs). LetU = U\® 
• • - ^Un be an update program. The corresponding dynamic logic program T(U) = 
V = Po (B ■ ■ ■ (B Pn is obtained by the following inductive construction, using at 
each step t an auxiliary set of persistent commands PCt-' 

Base step: Pq = {} with PCq = {}. 

Inductive step: Let Tt-i{U) = Vt-i = Pq (B ■ ■ ■ (B Pt-i with the set of 
persistent commands PCt-i be the translation of Ut-i = Ui (B ■ ■ ■ (B Ut-i- The 
translation oiUt = U\ ® ■ ■ ■ ®Ut \s Tt{U) = Vt = Po (B ■ ■ ■ (B Pt-i © Pt with the 
set of persistent commands PCt, where: 

PCt = PCt-i U {assert R when (f> : always R when (f G Ut} U 
U {assert event R when (f> : always event R when (f> G Ut}U 

— {assert [event] R when (j) : cancel R when xp GUt f\ ® 7^t-i 1= ^/>} — 

— {assert [event] R when (p : retract R when xp G Ut A ^Vt-i \= xp} — 



NUt = UtU PCt 

Pt = {N{R) H{R) ^ B{R), N{R) : assert [event] R when (p G NUtA 

A ® Vt-i 1= U 

U {not N{R) retract [event] R when (p G NUt A ® 1= </>} U 

U {not N{R) assert event R when <p G NUt-i A 0 Vt-2 !=</'}© 

U {N{R) retract event R when (p G NUt-i A @Vt~2 1= 4’^ 

Definition 10 (LUPS Semantics). LetU be an update program. A query 

holds Li, . . . ,Ln at t 



is true in U iff 0^ T{U) B L\, . . . ,Ln. 
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Abstract. This paper explores the applicability of the new paradigm of 
Multi-dimensional Dynamic Logic Programming to represent an agent’s 
view of the combination of societal knowledge dynamics. The represen- 
tation of a dynamic society of agents is the core of MXM £TZV A [11], an 
agent architecture and system designed with the intention of providing 
a common agent framework based on the unique strengths of Logic Pro- 
gramming, hat allows the combination of several non-monotonic knowl- 
edge representation and reasoning mechanisms developed in recent years. 



1 Introduction 

Over recent years, the notion of agency has claimed a major role in defining 
the trends of modern research. Influencing a broad spectrum of disciplines such 
as Sociology, Psychology, among others, the agent paradigm virtually invaded 
every sub-field of Computer Science [3,8,16]. Although commonly implemented 
by means of imperative languages, mainly for reasons of efficiency, the agent 
concept has recently increased its influence in the research and development 
of computational logic based systems. Since efficiency is not always the crucial 
issue, but clear specification and correctness is. Logic Programming and Non- 
monotonic Reasoning have been brought back into the spotlight. 

The Logic Programming paradigm provides a well-defined, general, integra- 
tive, encompassing, and rigorous framework for systematically studying com- 
putation, be it syntax, semantics, procedures, or attending implementations, 
environments, tools, and standards. LP approaches problems, and provides so- 
lutions, at a sufficient level of abstraction so that they generalize from problem 
domain to problem domain. This is afforded by the nature of its very foundation 
in logic, both in substance and method, and constitutes one of its major assets. 
To this accrues the recent significant improvements in the efficiency of Logic 
Programming implementations for Non-monotonic Reasoning (e.g. [14,18]). Be- 
sides allowing for a unified declarative and procedural semantics, eliminating the 
traditional wide gap between theory and practice, the use of several and quite 
powerful results in the field of non-monotonic extensions to Logic Programming 
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(LP), such as belief revision, inductive learning, argumentation, preferences, ab- 
duction, etc. [16] can represent an important composite added value to the design 
of rational agents. 

Until recently, Logic Programming could be seen as a good representation 
language for static knowledge. If we are to move to a more open and dynamic 
environment, typical of the agency paradigm, we need to consider ways of rep- 
resenting and integrating knowledge from different sources which may evolve in 
time. Moreover, an agent not only comprises knowledge about states, but also 
some form of knowledge about the transitions between states. This knowledge 
about state transitions can represent the agent’s knowledge about the environ- 
ment’s evolution, as well as its own behaviour and evolution. Since logic programs 
describe knowledge states, it’s only fit that logic programs describe transitions 
of knowledge states as well. It is natural to associate with each state a set of 
transition rules to obtain the next state. Recent developments have opened Logic 
Programming to these otherwise unreachable dynamic worlds [1,4,6,17,19]. 

In [1], the authors, with others, introduced Dynamic Logic Programming. 
There, they studied and defined the declarative and operational semantics of 
sequences of logic programs (or dynamic logic programs) . Each program in the 
sequence contains knowledge about some given state, where different states may, 
for example, represent different time periods or different sets of priorities. The 
introduction of Dynamic Logic Programming has extended Logic Programming, 
making possible for a logic program to undergo a sequence of modifications, 
opening up the possibility of incremental design and evolution of logic pro- 
grams, therefore significantly facilitating modularization of logic programming 
and, thus, modularization of non- monotonic reasoning as a whole. 

In [2], the authors, with others, introduced the language LUPS - “Language 
for dynamic updates” - designed for specifying changes to logic programs. Given 
an initial knowledge base (as a logic program) LUPS provides a way for sequen- 
tially updating it, unifying states and state transitions into a single declarative 
logic based framework. 

Even though the main motivation behind the introduction of Dynamic Logic 
Programming was to represent the evolution of knowledge in time, the rela- 
tionship between the different states can encode other aspects of a system, as 
explored in [1,9,5,15,12]. Although Dynamic Logic Programming can represent 
several states in one evolving dimension or aspect of a system, no more than 
one such aspectual evolution can be encoded and combined simultaneously. This 
is so because Dynamic Logic Programming is defined only for linear sequences 
of states. Multi- dimensional Dynamic Logic Programming {AiVCP) [10] was 
introduced to generalize DLP to allow for collections of states represented by 
arbitrary acyclic digraphs (DAG), not just sequences of states. AAV CP assigns 
semantics to sets and subsets of logic programs, depending on how they stand 
in relation to one another, as defined by the DAG that represents the states and 
their configuration. By dint of such natural generalization, AAVCP affords extra 
expressiveness, thereby enlarging the latitude of logic programming applications 
unifiable under a single framework. The flexibility provided by a DAG ensures a 
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wide scope and variety of new possibilities. By virtue of the newly added charac- 
teristics of multiplicity and composition, MV CP provides a “societal” viewpoint 
in Logic Programming, important in these web and agent days, for combining 
knowledge in general. 

In this paper we explore the application of the new paradigm of Multi- 
dimensional Dynamic Logic Programming to represent an agent’s view of the 
combination of societal knowledge dynamics, i.e. the agent’s view of the evolu- 
tion of its knowledge as a result of knowledge evolution in the community of 
agents. 

We begin with a brief overview of DLP in Section 2. In Section 3, we present 
MVCP. In Section 4 we explore the application of MVCP to represent inter 
and intra-agent relationships and their views of a multi-agent system. We then 
conclude in Section 6. 



2 Background 

We start with an overview of the syntax and semantics of generalized logic pro- 
grams, followed by a short recap of the paradigm of Dynamic Logic Programming. 



2.1 Generalized Logic Programs and Their Stable Models 

To represent negative information in logic programs and in their updates, since 
we need to allow default negation not A not only in premises of their clauses 
but also in their heads, we use generalized logic programs as defined in [1]^. By 
a generalized logic program P in a language C we mean a finite or infinite set 
of propositional clauses of the form Lg <— Li,...,L„ where each Li is a lit- 
eral (i.e. an atom A or the default negation of an atom not A). If r is a clause 
(or rule), by H{r) we mean L, and by B(r) we mean L\, . . . , Ln- li H{r) = A 
(resp. H{r) = not A) then notH{r) = not A (resp. notH{r) = A). By a (2- 
valued) interpretation M oi C we mean any set of literals from C that satisfies 
the condition that for any A, precisely one of the literals A or not A belongs 
to M. Given an interpretation M we define M+ = {A : A is an atom, A S M} 
and M~ = {not A : A is an atom, not A G M}. Following established tradition, 
wherever convenient we omit the default (negative) atoms when describing in- 
terpretations and models. We say that a (2-valued) interpretation M of £ is a 
stable model of a generalized logic program P if r{M) = least {r{P) U r{M~)), 
where r(.) univocally renames every default literal not A in a program or model 
into new atoms, say not -A. The class of generalized logic programs can be viewed 
as a special case of yet broader classes of programs, introduced earlier in [13], 
and, for the special case of normal programs, their semantics coincides with the 
stable models semantics [7]. 

^ In [2] the reader can find the motivation for the usage of generalized logic programs, 
instead of using simple denials as a result of freely moving the head nots into the 
body. 
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2.2 Dynamic Logic Programming 

Next we recall the semantics of dynamic logic programming [1]. A dynamic logic 
program is a sequence Pq ® ••• ® -Pn © •■• (also denoted by 0P, where V = 
{Pg : s € S'} is a finite or infinite sequence of LPs, indexed by the finite or infinite 
set S = {1, 2, . . . , n, . . .}. Such sequence may be viewed as the outcome of 
updating Pq with Pi, updating it with Pn,--- As we will see in the following 
sections, each Pi is determined by the state transition. The role of dynamic 
logic programming is to ensure that these newly added rules are in force, and that 
previous rules are still valid (by inertia) for as long as they do not conflict with 
more recent ones, whenever the latter remain in force themselves. The notion of 
dynamic logic program at state s, denoted by 0^P, characterizes the meaning 
of the dynamic logic program when queried at state s, by means of its stable 
models, defined as follows: 

Definition 1 (Stable Models of DLP). Let 0P = 0{ Pg : s & S} be a 

dynamic logic program, let s € S. An interpretation Mg is a stable model of 
0P at state s iff Mg = least{Vg — Reject{s, Mg) U Default(Mg)) where: 

'Ps = Ui<s Pi 

Reject{s, Mg) = {r e Pi : 3r' G Pj,i < j < s,H{r) = notH{r') A Mg 1= B{r')} 
Default{Vg, Mg) = {not A : ^r G Vg,H{r) =AA Mg \= B{r)} 

and where A is an atom. 

3 Multi-dimensional Dynamic Logic Programming 

Even though the main motivation behind the introduction of DLP was to rep- 
resent the evolution of knowledge in time, the relationship between the different 
states can encode other aspects of a system, as pointed out in [1]. In fact, since 
its introduction, DLP (and LUPS) has been employed to represent a stock of 
features of a system, namely as a means to: represent and reason about the evo- 
lution of knowledge in time [1] ; combine rules learnt by a diversity of agents [9] ; 
reason about updates of agents’ beliefs [5]; model agent interaction [15]; model 
and reason about actions [1]; resolve inconsistencies in metaphorical reasoning 
[12]. 

The common feature among these applications of DLP is that the states 
associated with the given set of theories encode only one of several possible 
representational dimensions (e.g. time, hierarchies, domains,...), inasmuch DLP 
is defined for linear sequences of states alone. For example, DLP can be used 
to model the relationship of a strict hierarchy group of agents, and DLP can be 
used to model the evolution of a single agent over time. But DLP, as it stands, 
cannot deal with both settings at once, and model the evolution of one such 
group of agents over time. 

In effect, knowledge updating is not to be simply envisaged as taking place in 
the time dimension alone. Several updating dimensions may combine simultane- 
ously, with or without the temporal one, such as specificity (as in taxonomies). 
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strength of the updating instance (as in the legislative domain), hierarchical po- 
sition of knowledge source (as in organizations), credibility of the source (as in 
uncertain, mined, or learnt knowledge), or opinion precedence (as in a society of 
agents). For this to be possible, DLP needs to be extended to allow for a more 
general structure of states. 

In this section we present the notion of Multi- dimensional Dynamic Logic 
Programming ( MDCV ) (introduced in [10]) which generalizes DLP to allow for 
collections of states represented by arbitrary acyclic digraphs. In this setting, 
MVCV assigns semantics to sets and subsets of logic programs, depending on 
how they relate to one another, these relations being defined by the acyclic 
digraph representing the states. 



3.1 Graphs 

A directed graph, or digraph, D = (V, E) is a pair of two finite or infinite sets V = 
Vd of vertices and E = Ed of pairs of vertices or {directed) edges. A directed edge 
sequence from vq to Vn in a digraph is a sequence of edges ei, C 2 , ..., e„ G Ed such 
that 6i = (vi-i,Vi) for i = A directed path is a directed edge sequence 

in which all the edges are distinct. A directed acyclic graph, or acyclic digraph 
(DAG), is a digraph D such that there are no directed edge sequences from v to 
V, for all vertices v oi D. A source is a vertex with in- valency 0 (number of edges 
for which it is a final vertex) and a sink is a vertex with out-valency 0 (number 
of edges for which it is an initial vertex). We say that u < w if there is a directed 
path from v to w and that v<w if v<w or v = w. The relevancy DAG of a DAG 
D wrt a vertex of D is Dy = {Vy, Ey) where Vy = {vi : Vi G V and Vi < w} and 
Ey = {{vi,Vj) : (vi,Vj) G E and Vi,vj GVy }. The relevancy DAG of a DAG 
D wrt a set of vertices S' of ZJ is Ds = (Vs,Es) where Vs = Uj,gs 
Es = U„6S Ey, where Dy = (Vy,Ey) is the relevancy DAG of D wrt v. 

3.2 Declarative Semantics 

We start by defining the framework consisting of the generalized logic programs 
indexed by a DAG. Throughout this paper, we will restrict ourselves to DAG’s 
such that for every vertex v of the DAG, any path ending in v is finite. 

Definition 2 (Multi-dimensional Dynamic Logic Program). Let C he a 

propositional language. A Multi-dimensional Dynamic Logic Program (MDLP), 
V, is a pair {Vd, D) where D = {V, E) is a DAG and Vd = {Pv '. v GV} is a set 
of generalized logic programs in the language C, indexed by the vertices v G V of 
D. We call states such vertices of D. For simplicity, we often leave the language 
L implicit. 

To characterize the models of V at any given state we will keep to the basic 
intuition of logic program updates, whereby an interpretation is a stable model 
of the update of a program P by a program U iff it is a stable model of a program 
consisting of the rules of Lf together with a subset of the rules of P comprised 
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by those that are not rejected (do not carry over by inertia) due to their being 
overridden by program U. With the introduction of a DAG to index programs, 
a program may have more than a single ancestor. This has to be dealt with, 
the desired intuition being that a program G Vd can be used to reject rules 
of any program G Pd if there is a directed path from u to v. Moreover, if 
some atom is not defined in the update nor in any of its ancestor, its negation 
is assumed by default. Formally, the stable models of the MDLP are: 

Definition 3 (Stable Models at state s). Let V = {Vd,D) be a MDLP, 
where Vd = {Pv '■ v D = (V, E) and s G U. An interpretation Mg is a sta- 

ble model of P at state s iff Mg = least {[Vg — Reject{s, Mg)] U Default (Vg,Mg)) 
where: 

Ps = [Ji<gPi 

Reject{s, Mg) = {r G Pi | 3r' G Pj,i < j < s,H{r) = notH{r') A Mg 1= B{r')} 

Default (Vg,Mg) = {not A \ $r G Vg : {H{r) = A) A Mg \= B{r)} 



Intuitively, the set Reject{s, Mg) contains those rules belonging to a program 
indexed by a state i that are overridden by the head of another rule with true 
body in state j along a path to state s. Vg contains all rules of all programs that 
are indexed by a state along all paths to state s, i.e. all rules that are potentially 
relevant to determine the semantics at state s. The set Default (Vg,Mg) contains 
default negations not A of all unsupported atoms A, i.e., those atoms A for which 
there is no rule in Vg whose body is true in Mg. 

According to [10], to determine the models of a MDLP at state s, we need 
only consider the part of the MDLP corresponding to the relevancy graph wrt 
state s. 

We might have a situation where we desire to determine the semantics jointly 
at more than one state. If all these states belong to the relevancy graph of one 
of them, we simply determine the models at that state. But this might not be 
the case. Formally, the semantics of a MDLP at an arbitrary set of its states is 
determined by the definition: 



Definition 4 (Stable Models at a set of states S). Let V = {Vd,D) be a 
MDLP, where Vd = {Pv :vGV} and D = (V,E). Let S be a set of states such 
that S CV. An interpretation Ms is a stable model of P at the set of states S 
iff Ms = least {[Vs — Reject{S, Ms)] U Default {Vs, Ms)) where: 



ps=[^g^s{y}^<sP^) 



Reject{S, Ms) = < 
Default {Vs, Ms) 



r G Pi ]3s G S, 3r' G Pj,i < j < s,) 

H{r) = not H{r') A Ms \= B{r') j 
= [not A]$r GVs '■ {H{r) = A) A Ms N B{r)} 



This is equivalent to the addition of a new vertex a to the DAG, and con- 
necting to a, by addition of edges, all states we wish to consider. Furthermore, 
the program indexed by a is empty. We then determine the stable models of 
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this new MDLP at state a. Note the addition of state a does not affect the 
stable models at other states. Indeed, a and the newly introduced edges do 
not belong to the relevancy DAG wrt any other state. A particular case of the 
above definition is when S = V, corresponding to the semantics of the whole 
MDLP. In [10], we have presented an alternative definition, based on a purely 
syntactical transformation that, given a MDLP, produces a generalized logic 
program whose stable models are in a one-to-one equivalence relation with the 
stable models of the MDLP previously characterized. The computation of the 
stable models at some state s reduces to the computation of the transformation 
followed by the computation of the stable models of the transformed program. 
This directly provides for an implementation of MLDCV, publicly available at 
centria. di . f ct .uni .pt/~ j ja/updates. 



4 Inter- and Intra-Agent Social Viewpoints 

The previous section contains the definition of the notion of Multi- dimensional 
Dynamic Logic Programming, MDCV, as an extension of DLP to allow for states 
to be related by an arbitrary DAG. The stable models of MiVCV have been 
characterized but nothing has been yet explained as how to use such DAGs to 
represent real problems. In particular, we have not shown how DAGs allow for the 
combination of more than one representational dimension, the very motivation 
to introduce AiDCV. Here, we explore some particular classes of DAGs suitable 
in the context of multi-agent systems. 

Agents are situated and therefore need to represent and reason about infor- 
mation they obtain directly by sensing the environment or communicated by 
other agents. These agents, as well as the environment, evolve in time, i.e. the 
incoming information is to be used as an update over existing knowledge. More- 
over these agents do not have the same credibility, this being represented via a 
hierarchy of predominance. In this section we explore DAGs that provide a way 
to represent the evolution in time of knowledge with provenance in a community 
of hierarchically related agents. 

We start with an agent a, situated in a community of agents represented 
by the greek letters (3,^,g,,v. The multi-agent system is A = {a, (3,^, yi,v}. 
According to agent a’s hierarchical view of the world, and its position within 
the community, all agents are related according to the DAG Dh = {A, Eh) where 
Eh = {(g l) , iP, m) . {l, l) , (l, a) , ( 7 . a)}, depicted in Fig. 1 a). 

According to this DAG, agent a’s opinions prevail over those of every other 
agent. However this need not be so. If, for example, one of these agent’s role was 
to coordinate the community, it would be natural to exist an edge connecting a 
to this agent. 

In a static environment, this representation would be sufficient to determine 
the semantics of a’s view of the community. In such a situation, the rules asserted 
by each agent would constitute programs indexed by the DAG of Fig. 1 a), i.e. 
Pf},P-r,Phy- 
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Fig. 1. a) Hierarchical Dimension b) Temporal Dimension 



In a realistic scenario, where the dynamics of the system cannot be ignored, 
there is no single program representing each agent. Rather, there is a sequence of 
programs representing the knowledge of each agent at each time point. Suppose 
these time points were represented by the set T = {0,l,...,c} (where by c we 
mean the current time state), then, for example, the knowledge of agent [3 would 
be represented by the set of programs P^^}, indexed according to 

the DAG = (Bt, Et) where Bt = {(it : t S T} and Et = {(0, 1) , ..., (c — 1, c)} 
as depicted in Fig. 1 b). 

The full dynamic hierarchical scenario, comprising all agents, is then repre- 
sented by the set of programs Vd = [Pat : a € A, t G T} indexed by the DAG 
D = (At,E) where At = {at : a G A, t G T}. 

There still remains to be defined the relationships between all these programs, 
i.e. the edges belonging to E. To this purpose, we will propose three basic ways 
to systematically relate these programs. 

4.1 Equal Role Representation 

The first approach to combining the hierarchical and temporal dimensions is 
accomplished by assigning equal roles to both precedence relations. In this sce- 
nario, we maintain the temporal precedence relation within each agent, and the 
hierarchical one within each time state, and we do not relate any two programs 
that fall outside this scope, i.e. there is no precedence between a higher ranked 
older program and a lower ranked newer one. Accordingly, the set of edges E, of 
the DAG D contains the union of the following two sets of edges: 

Time Dependence Edges (TDE) : {{at^^at^) : a G A, G, ^2 G T, G < ^ 2 }- 
Hierarchy Dependence Edges (HDE) : {(at,bt) : a,b & ^ T,a <b}. 

Intuitively, each rule can be used to reject any rule of a lower ranked agent 
indexed by a time state equal or lower than its own. This situation is depicted 
in Fig. 2. 

Remark 1. Throughout this section, we have chosen a simplified representation 
of the DAGs to make their interpretation easier. For this purpose, we introduce 
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Fig. 2. Equal Role Representation 



new nodes (meta-nodes) encapsulating part of the DAG (detail). To obtain the 
complete DAG from this simplification one needs to replace the meta-node with 
the detail while replacing the edges entering the meta-node with a set of edges 
entering each source node of the detail. Similarly, one needs to replace each node 
departing from the meta-node with a set of edges departing from each sink of 
the detail. In every DAG, we have added a new node labelled o', which becomes 
its single sink, and an empty program associated with it, indicating where the 
semantics corresponding to agent a's view of the overall system at time state 
c can be determined. Also, since the semantics of MDLP is invariant wrt the 
transitive closure of the DAG, we will often be omitting some edges that do not 
affect such transitive closure. 

Such a scenario can be found in legal reasoning, where the legislative agency 
is divided conforming to a hierarchy of power, governed by the principle Lex 
Superior (Lex Superior Derogat Legi Lnferiori) according to which the rule issued 
by a higher hierarchical authority overrides the one issued by a lower one, and 
the evolution of law in time is governed by the principle Lex Posterior (Lex 
Posterior Derogat Legi Priori) according to which the rule enacted at a later 
point in time overrides the earlier one. Lex Superior is encoded by the Hierarchy 
Dependence Edges and Lex Posterior is encoded the Time Dependence Edges. 

Allowing rejection governed by time and hierarchy alone, potentiates contra- 
diction inasmuch as there are many pairs of programs not related according to 
this graph. If the purpose of our agency system were to perform some sort of 
paraconsistent reasoning, such as in an agent based negotiation system trying to 
reach an agreement, this would be the ideal scenario: contradiction would gen- 
erate messages to the responsible agents to possibly review their positions. But 
often this is not the case and we may want to reduce the amount of contradiction, 
namely by establishing a skewed relation between the temporal and hierarchical 
dimensions. Two approaches will be explored in the following subsections. 
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Fig. 3. Time Prevailing Representation 



4.2 Time Prevailing Representation 

According to this representation, the DAG D contains, besides the Time and 
Hierarchy Dependence Edges, the following edges: 

Time Prevailing Edges (TPE) : : a,b G & T,ti < t2}- 

The intuitive reading is that any rule indexed by a more recent time state 
overrides any older rule, independently of which agents these rules belong to. 
This situation is depicted in Fig. 3. 

This representation is particularly useful in very dynamic situations where 
competence is distributed, i.e. when knowledge changes rapidly and different 
agents will typically provide rules about different literals. This is so mainly 
because any newer rule always overrides any older one. It means that if a situation 
is completely defined by the rules issued by the community at a given time state, 
one can simply ignore older rules. 

The main drawback of this representation relates to the trustfulness of agents 
in the community. It requires all agents to be fully trusted because, in allowing 
all new rules to override all old ones, irrespective of their hierarchical position, 
any untrustworthy lower ranked agent can override any higher ranked agent just 
by issuing a rule at a later time state. This leads us to the next, alternative, 
representation. 



4.3 Hierarchy Prevailing Representation 

According to this representation, the DAG D contains, besides the Time and 
Hierarchy Dependence Edges, the following edges: 



Hierarchy Prevailing Edges (HPE) : {(0*1,642) : a, 6 G AA1A2 GT,a < b}. 
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The intuitive reading is that any rule indexed by a higher ranked agent 
overrides any lower ranked agent’s rule, independently of the time state it is 
indexed by. This situation is depicted in Fig. 4. 

This situation is useful, in contrast with the previous one, when some of the 
agents are untrustworthy because a lower ranked agent rule, to be used, may 
not be contradicted by any (even if older) higher ranked agent rule. The main 
drawback is that one has to consider the entire history of all higher ranked agents 
in order to accept/reject a rule provided by a lower ranked agent. However, a 
number of techniques to reduce the size of a dynamic logic program are being 
developed, useful for simplifying the time sequence of programs of each individual 
agent. These are outside the scope of this paper. 

Again in the context of Legal Reasoning, this scenario corresponds to the 
one used in many Legislatures, where collisions between rules are governed by 
the principle Lex Superior Priori Derogat Legi Inferiori Posterior, i.e. the rule 
issued by a higher hierarchical authority at an earlier point overrides the one 
issued by a lower hierarchical authority at a later point. 

4.4 Representing Inter- and Intra- Agent Relationships 

The representations set forth in the previous sub-sections refer to a community 
of agents. Nevertheless, they can be used at different levels of abstraction to 
represent macro and micro aspects of a multi-agent system, in a unified manner. 
Let us suppose that agent a is composed of several sub-agents concurrently 
performing dedicated tasks while reading and writing onto a common knowledge 
structure. According to this view, agent a can now be seen as a community of 
sub-agents Aa = {cea,0'b,ced,0'e}, related, for example, according to the DAG 
Da = (Aa,Ea) where Ea = {{Ua, ab) , {Ua, ae) , {ab,ad) ,{ae,ad)} as in Fig. 
5. The overall dynamic system, comprising all agents and sub-agents, is now 
represented by the set of programs Vd = {Pat ■ a* G ^t} indexed by the DAG 
D = {At, E) where At = {at ■ a € A\{a}, t G T} U {at : a G Aa,t G T}. 
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Fig. 5. Sub-agent Hierarchy 



As for the relations between the programs, we propose a combination of the 
time and hierarchy prevailing representations to relate the sub-agents and agents 
respectively. As mentioned before, the time prevailing representation is the most 
efficient but requires all agents to be trusted. One would expect an agent to 
trust its component sub-agents. As for the representation of other agents, we 
will opt for the hierarchy prevailing relation. Formally, the set of edges in the 
DAG contains: 

Time Prevailing Edges (TPE) : {(atj , : a, 6 G Aa, ti,t 2 G T, ti < O}, to 

model the relationships between the sub-agents of a. 

Hierarchy Prevailing Edges (HPE) : {(at^,bt2) : a, 6 G AA1A2 &T,a < b}, 
to model the relationships between the agents of the system. Note that each 
edge entering (resp. departing from) at should be interpreted as a set of 
edges entering (resp. departing from) each of {oot, adt, cTet}- 

This situation is depicted in Fig. 6. Note however that this is just one proposal 
of the many possible existing combinations to represent such relations. 

5 Conclusions 

In this paper we have explored Multi-dimensional Dynamic Logic Programming 
as a means to combine knowledge provenient from different agents, into a single 
knowledge base point of view, with a precise declarative semantics. Depending 
on the situation and the relations amongst the agents, we have envisaged several 
classes of acyclic digraphs suitable for its encoding. 

Based on the strengths of JvWCV as a framework capable of simultane- 
ously represent several aspects of a system in a dynamic fashion, and of LUPS 
[2] as a powerful language to specify the evolution of such representations by 
means of transitions, we have launched into the design of an agent architecture, 
AiXN STZV A [11]. It aims at providing, on a sound theoretical basis, a common 
agent framework based on the strengths of Logic Programming, to allow the 
combination of several non-monotonic knowledge representation and reasoning 
mechanisms developed in recent years. 

The use of Logic Programming for the overall endeavour is justified on the 
ground of it providing a rigorous single encompassing theoretical basis for the 
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Fig. 6. Inter- and Intra-Agent Relationship Representation 



aforesaid topics, as well as an implementation vehicle for parallel and distributed 
processing. Additionally, logic programming provides a formal high level flexible 
instrument for the rigorous specification and experimentation with computa- 
tional designs, making it extremely useful for prototyping, even when other, 
possibly lower level, target implementation languages are envisaged. 

Rational agents, in our opinion, will require an admixture of any number of 
those reasoning mechanisms mentioned in the introduction, to carry out their 
tasks. To this end, a MlNETZVA agent is based on a modular design where a 
common knowledge base is concurrently manipulated by specialized sub-agents. 
The common knowledge base contains all knowledge shared by more than one 
sub-agent. It is conceptually divided in the following components: Capabilities, 
Intentions, Goals, Plans, Reactions, Object Knowledge Base and Internal Be- 
haviour Rules. Although conceptually divided in such components, all these mod- 
ules share a common representation mechanism based on AiVCV and LUPS, 
the former to represent knowledge at each state and LUPS to represent the 
state transitions, i.e. the common part of the agent’s behaviour. Every agent 
is composed of specialized functionality related subagents, that execute various 
specialized tasks. Examples of such subagents are those implementing the reac- 
tive, planning, scheduling, belief revision, goal management, learning, dialogue 
management, information gathering, preference evaluation, strategy, and diag- 
nosis functionalities. These sub-agents contain a LUPS program encoding their 
behaviour, and interfacing with the Common Knowledge Base. Whilst some of 
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those sub-agent’s functionalities are fully specifiable in LUPS, others require 
private specialized procedures where LUPS serves as an interface language. 
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Abstract. Multi-adjoint logic program generalise monotonic logic pro- 
grams introduced in [1] in that simultaneous use of several implications 
in the rules and rather general connectives in the bodies are allowed. 

In this work, a procedural semantics is given for the paradigm of multi- 
adjoint logic programming and completeness theorems are proved. 



1 Introduction 

Multi-adjoint logic programming was introduced in [5] as a refinement of both 
initial work in [7] and residuated logic programming [1]. It allows for very general 
connectives in the body of the rules, and sufficient conditions for the continuity 
of its semantics are known. Such an approach is interesting for applications: in [6] 
a system is presented where connectives are learnt from different users’ examples 
and, thus, one can imagine a scenario in which knowledge is described by a many- 
valued logic program where connectives have many- valued truth functions and @ 
is an aggregation operator (arithmetic mean, weighted sum, . . . ) where different 
implications could be needed for different purposes, and different aggregators are 
defined for different users, depending on their preferences, e.g. the rule below: 

hotel_reservation(Business_Location, Time, Hotel) < — j 
@(near_to(Business_Location, Hotel), 
cost_reasonable(Hotel, Time), 

building_is_f ine(Hotel)) . with truth value 0.8 

The framework of multi-adjoint logic programming was introduced in [5] as a 
generalisation of the monotonic and residuated logic programs given in [1]. The 
special features of multi-adjoint logic programs are: (1) a number of different 
implications are allowed in the bodies of the rules, (2) sufficient conditions for 
continuity of its semantics are known, and (3) the requirements on the lattice of 
truth-values are weaker that those for the residuated approach. 
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The purpose of this work is to provide a procedural semantics to the paradigm 
of multi-adjoint logic programming. This work is an extension of [3], with a 
different treatment of reductants and an improved version of the completeness 
results, based on the so-called supremum property. 

The central topics of this paper are mainly at the theoretical level, however 
the obtained results can be applied in a number of contexts: 

1. The integration of information retrieval and database systems requires meth- 
ods for dealing with uncertainty; there are already a number of many-valued 
approaches to the general theory of databases, but none of them contains 
a formal mathematical proof of the relation between the relational algebra 
and its denotational counterpart; a by-product of the obtained results is the 
possibility of defining a fuzzy relational algebra and a fuzzy Datalog, the 
completeness result then shows that the expressive power of fuzzy Datalog 
is the same that the computational power of the fuzzy relational algebra; 

2. One of the problems of fuzzy knowledge bases is handling a great amount 
of items with very small confidence value. The approach introduced in this 
paper enables us to propose a sound and complete threshold computation 
model oriented to the best correct answers up to a prescribed tolerance level. 

3. The multi-adjoint framework can also be applied to abduction problems. 
In [4] the possibility of obtaining the cheapest possible explanation to an 
abduction problem wrt a cost function by means of a logic programming 
computation followed by a linear programming optimization has been shown. 

2 Preliminary Definitions 

To make this paper as self-contained as possible, the necessary definitions about 
multi-adjoint structured are included in this section. For motivating comments, 
the interested reader is referred to [5]. 

Definition 1. Let (L, he a complete lattice. A multi-adjoint lattice C is a 
tuple (L, ^ 1 , &i, . . . , ^n, &n) satisfying the following items: 

1- {L,^) is bounded, i.e. it has bottom and top elements; 

2. T -d = -d T = -d for alld € L for i = 1, . . . ,n; 

3. (<— i, &i) is an adjoint pair in {L, ;<) for i = 1, . . . , n; i.e. 

(a) Operation &i is increasing in both arguments, 

(b) Operation <— j is increasing in the first argument and decreasing in the 
second argument, 

(c) For any x,y,z G P, we have that x < {y <— * z) holds if and only if 
(x z) ^ y holds. 

From the point of view of expressiveness, it is interesting to allow extra 
operators to be involved with the operators in the multi-adjoint lattice. The 
structure which captures this possibility is that of a multi-adjoint O-algebra 
which can be understood as an extension of a multi-adjoint lattice containing a 
number of extra operators given by a signature 17. 
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We will be working with two 17-algebras: the first one, to define the syntax 
of our programs, and the second one, £, to host the manipulation of the truth- 
values of the formulas in the programs. To avoid possible name-clashes, we will 
denote the interpretation of an operator symbol w in 17 under £ as w (a dot on 
the operator), whereas lo itself will denote its interpretation under 

Definition 2 (Multi- Adjoint Logic Programs). A multi-adjoint logic pro- 
gram on a multi-adjoint 17-algebra ^ with values in a multi-adjoint lattice 2, (in 
short multi-adjoint program^ is a set P of rules of the form {{A B),‘d). 

1. The rule {A B) is a formula of 

2. The confidence factor d is an element (a truth-value) of L; 

3. The head of the rule A is a propositional symbol of II . 

4- The body formula B is a formula of ^ built from propositional symbols 
Bi, . . . , Bn (n > OJ by the use of conjunctors &i, . . . , &„ and Ai, . . . , Ak, 
disjunctors Vi, . . . , V; and aggregators 

5. Facts are rules with body T. 

6. A query ( or goal ) is a propositional symbol intended as a question ?A prompt- 
ing the system. 

As usual, an interpretation is a mapping I: II — > L. The set of all inter- 
pretations of the formulas defined by the 17-algebra ^ in the 17-algebra £ is 
denoted Note that each of these interpretations can be uniquely extended to 
the whole set of formulas, I: Fq L. 

The ordering A of the truth- values L can be easily extended to X^;, which 
also inherits the structure of complete lattice. 

Definition 3. 

1. An interpretation I & Is. satisfies {A B,D) if and only if‘dhI{A B). 

2. An interpretation I & Is is a model of a multi-adjoint logic program P ijf 
all weighted rules in P are satisfied by I. 

3. An element X € L is a correct answer for a program P and a query lA if for 
any interpretation I & Is which is a model o/P we have A A I {A). 

The immediate consequences operator, given by van Emden and Kowalski, 
can be easily generalised to the framework of multi-adjoint logic programs. 

Definition 4. Let P 6e o multi-adjoint program. The immediate consequences 
operator Ty-.Is ^Is., mapping interpretations to interpretations, is defined by 

T^{I){A) = snp{dk^^{B)\A^,B€r} 

As usual, the semantics of a multi-adjoint logic program is characterised by 
the post-fixpoints of Tp, see [5]; that is, an interpretation / of Is is a model 
of a multi-adjoint logic program P iff Tp{I) C I. It is remarkable the fact that 
this result is still true even without any further assumptions on conjunctors 
(definitely they need not be commutative and associative). 

Regarding continuity, the following theorem was proved in [5]. 
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Theorem 1 ([5]). 

1. If all the operators occurring in the bodies of the rules of a program P are 
continuous, and the adjoint conjunctions are continuous in their second ar- 
gument, then Tj? is continuous. 

2. If the operator Tp is continuous for all program P on Z, then any operator 
in the body of the rules is continuous. 

3 Procedural Semantics of Multi-adjoint Logic Programs 

Once we have shown that the Tp operator can be continuous under very general 
hypotheses, then the least model can be reached in at most countably many 
iterations. Therefore, it is worth to define a procedural semantics which allow 
us to actually construct the answer to a query against a given program. 

For the formal description of the computational model, we will consider an 
extended the language defined on the same graded set, but whose carrier is 
the disjoint union II 1±IL; this way we can work simultaneously with propositional 
symbols and with the truth- values they represent. 

Definition 5. Let T be a multi-adjoint program, and let V C L be the set of 
truth values of the rules in P. The extended language j?' is the corresponding 
fl -algebra of formulas freely generated from the disjoint union of II and V . 

We will refer to the formulas in the language simply as extended formulas. 
An operator symbol uj interpreted under will be denoted as w. 

Our computational model will take a query (atom), and will provide a lower 
bound of the value of A under any model of the program. Given a program P, 
we define the following admissible rules for transforming any extended formula. 

Definition 6. Admissible rules are defined as follows: 

1. Substitute an atom A in an extended formula by fd^iB) whenever there 
exists a rule {A^iB, t)) in P. 

2. Substitute an atom A in an extended formula by _L. 

3. Substitute an atom A in an extended formula by id whenever there exists a 
fact {A^iT,-d) in P. 

Note that if an extended formula turns out to have no propositional symbols, 
then it can be directly interpreted in the multi-adjoint 17-algebra £. This justifies 
the following definition of computed answer. 

Definition 7. Let V be a multi-adjoint program, and let lA be a goal. An ele- 
ment @[ri, . . . , rm], with ri G L, for all i G {!,..., m} is said to be a computed 
answer if there is a sequence Gq, . . . , G„+i such that 

1. Go = A and G„+i = @[ri, . . . ,rm] where ri G L for all i = 1, . . .n. 

2. Every Gi, for i = 1, . . . ,n, is a formula in . 

3. Every Gj+i is inferred from Gi by one of the admissible rules. 

Note that our procedural semantics, instead of being refutation-based (this 
is not possible, since negation is not allowed in our approach), is oriented to 
obtaining a bound of the optimal correct answer of the query. 
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3.1 Reductants 

It might be the case that for some lattices it is not possible to get the correct 
answer, simply consider L to be the powerset of a two-element set {a, b} ordered 
by inclusion, and the following example from Morishita, used in [2]: 

Example 1. Given a multi-adjoint program P with rules A ^ B and A ^ B and 
fact {B, T). Assuming that the adjoint conjunction to <— has the usual boundary 
conditions, then the correct answer to the query ?A is T, since it has to be an 
upper bound of all the models of the program, therefore it has to be greater than 
both a and b. But the only computed answers are either a or b. □ 

The idea to cope with this problem is the generalisation of the concept of re- 
ductant [2] to our framework. Namely, that whenever we have a finite number 

of rules A . . . , for i = 1, . . . , fc, then there should exist another 

rule which allows us to get the correct value of A under the program. That can 
be rephrased as follows: 

As any rule A <— . . . , contributes with the value diSzih for 
the calculation of the lower bound for the truth-value of A, we would like to 
have the possibility of reaching the supremum of all the contributions, in the 
computational model, in a single step. This leads to the following definition. 

Definition 8 (Reductant). Let ¥ be a multi-adjoint program; assume that all 

■d- 

the rules in P with head A are A <— Bi for i = 1, ... ,n. A reductant for A is a 
rule A ^ @{Bi , . . . , Bn) such that for any 6i, . . . , &„ we have 

supjdi kibi\i=l,...,n} = diz @{bi, ■■■,bn) 
where & is the adjoint conjunctor to the implication 

Remark 1. When all the elements in the multiset are the same (e.g. all rules in 
the program have the same implication), and the operator & is continuous, then 
the following equality holds 

sup{z?i & I i = 1, . . . , n} = sup di & sup bi 

which immediately implies that choosing -d = supdi and @(6i, . . . , 6„) = sup bi 
we have constructed a reductant. 

The example below shows a program with reductants, whose truth-values 
range over the lattice of closed intervals in [0, 1]. 

Example 2. Consider the lattice of all the closed intervals in [0, 1], denoted C[0, 1] 
under the ordering [a,b] A [c, cf| iff a < c and d < b, and consider the componen- 
twise extended definition of the Lukasiewicz, product and Gddel implications to 
intervals. Let P be the program with rules 
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where < ti, and facts 

Let us see that P has reductants. In this particular case, the only head in the 
rules is A, so we have to define a reductant for A in P. 

From the two rules, we have the associated conjunctors to the implications 
Szp,SzG (also defined componentwise), given its two truth-intervals [i?i,'d2] and 
[titT2] there should exist a truth-interval [ei,C2], a conjunctor & and an aggre- 
gator @ such that for all [6i, 62], [ci, C2] € C[0, 1]: 

sup{ [di , •d2] &p [61 , 62] , [ti , T2] &G [ci , C2] } = [ei , £3] & @( [61 , 62] , [ci ,03]) 

In this case, it suffices to consider & = &g, [ei,e2] = [max(di, ti), min(t?2, T2)], 
and @([61,62], [ci,C2]) = [max(di6i, ci), min(r?262, C2)]. □ 

Certainly, it will be interesting to consider only programs which contain all 
its reductants, but this might be a too heavy condition on our programs; the 
following proposition shows that it is not true, therefore we can assume that a 
program contains all its reductants, since its set of models is not modified. 

i9 

Proposition 1. Any reductant A ^ B of V is satisfied by any model of P. In 
short, T'^A^B. 

It is possible to construct reductants for any head-of-rule in a given program 
under the only requirement that the truth-value set is complete under suprema; 
and this is actually an assumption for all multi-adjoint programs, which are 
based on a multi-adjoint lattice. 

Definition 9 (Construction of reductants). Let P 6e o multi-adjoint pro- 
gram; assume that all the rules in P with head A are {A^iBi, Di) for i = 1 , . . . ,n. 
A reductant for A is any rule {A ^ @(,81, . . . ,,8„), T) where <— is any impli- 
cation with an adjoint conjunctor (let us denote it and the aggregator @ is 
defined as follows 

@(61, . . . , 6„) = SUpjdi &i 61, . . . , -dn kn bn} 

It is immediate to prove that the rule constructed in the definition above is 
actually a reductant for A in P, and we state the fact in the following proposition. 

Proposition 2 . Under the hypotheses of the previous definition, the defined rule 
{A ^ @{Bi, . . . ,Bn), T) is a reductant for A in P. 

Note that we have followed just traditional techniques of logic programming, 
and discarded non-determinism by using reductants. A possible disadvantage 
of this technique is that the full search space must be traversed (every rule of 
every atom must be evaluated), although this need not be necessary in many 
circumstances. It is clear that some evaluation strategies might start by executing 
non-deterministically the rules for a given atom, and finally the reductant. This 
joined with some memoizing or tabling technique would not have significant 
overhead, and could improve performance. 
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3.2 Completeness Results 

The proof of the completeness theorems follows from some technical results. 
The first lemma below states that the least fix-point is also the least model of a 
program; the second states a characterisation of correct answers in terms of the 
Tp operator. 

Lemma 1. For all model I ofFwe have that Tp{A) C I. 

Lemma 2. X G L is a correct answer for program P and query 7 A if and only 
ifXAT^{A){A) 

Now, in order to match correct and computed answers, the proposition below, 
whose proof is based induction on n, shows that any iteration of the Tp operator 
is, indeed, a computed answer. 

Proposition 3. Let F be a program, then Tp(A){A) is a computed answer for 
all n and for all query 7A. 

We have now all the required background to prove a completeness result. 

Theorem 2. For every correct answer X G L for a program P and a query 7A, 
there exists a chain of elements A„ such that X A sup A„, such that for arbitrary 
no there exists a computed answer S such that A„g A <5. 

Proof: Consider A„ = Tp (a)(A). As A is a correct answer, we have that 

A ^ Tp (a)(A) = sup{Tp (a)(A) I n e N} = supA„ 

since Tp (a) is a model. Now we can choose 6 to be Tp (a)(A) for any n > Uq 
and the theorem follows directly by the monotonicity of the Tp operator and 
Proposition 3. □ 

The theorem above can be further refined under the assumption of the so- 
called supremum property: 

Definition 10. A cpo L is said to satisfy the supremum property if for all 
directed set X C L and for all e we have that if s < sup AT then there exists 
S G X such that e < 6 < sup A". 

Theorem 3 below states that any correct answer can be approximated up to 
any lower bound. 

Theorem 3. Assume L has the supremum property, then for every correct an- 
swer X G L for a program P and a query 7 A, and arbitrary e ^ A there exists a 
computed answer S such that e ^ S. 

Proof: As A is a correct answer, then A A Tp (a)(A), since Tp (a) is a model 
of P. By definition we have Tp (a)(A) = sup{Tp (a)(A) | n G N} and, by hy- 
pothesis, e ^ A A Tp (a)(A). The supremum property states that there exists 
an element S = Tp°(A)(A) in {Tp (a)(A) | n G N}, such that e ^ <5 A Tp (a)(A). 
This finishes the proof, for TjJ^°(a)(A) is a computed answer, by Proposition 3. □ 
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4 Conclusions and Future Work 

We have presented a framework for studying more elaborate proof procedures 
for multi-adjoint programs: a procedural semantics has been introduced, and two 
quasi-completeness theorems are stated and proved. 

As future work, from the theoretical side, it is necessary to further inves- 
tigate lattices with the supremum property; from the not-so-theoretical side, a 
practical evaluation of the proposed approach has to be performed, to evaluate 
the appropriateness of several possible optimisation techniques. 
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Abstract. An approach for representing and reasoning with 3-D qualitative 
orientation of point objects is presented. The model in 3-D is an extension of 
the Zimmerman and Freksa orientation model in 2-D. The paper presents 
attempts to represent 3-D spatial orientation in a final 3-D model. An iconic 
notation for 3-D spatial orientation relations is presented and the algebra is also 
explained. 

Keywords: Spatial Reasoning, Qualitative Reasoning, Qualitative Orientation. 



1 Introduction 



One of the main aims of the Artificial Intelligence field is to simulate human 
behaviour in general and build robots with a human-like performance in particular. 
The principal goal of the Qualitative Spatial Reasoning field is to represent our 
everyday common sense knowledge about the physical world, and the underlying 
abstractions used by engineers and scientists when they create quantitative models. 
Kak [9] points out that the behaviour of the intelligent machine of the future might 
carry out temporal reasoning, spatial reasoning and also reason over interrelated 
entities occupying space and changing in time with respect to their attributes and 
spatial interrelationships. Spatial information that we obtain through perception is 
coarse and imprecise, thus qualitative models which reason with distinguishing 
characteristics rather than with exact measures seems to be more appropriate to deal 
with this kind of knowledge. 

Supposing we want to know the qualitative orientation of the workmate's office in 
our university building (which has more than one floor) with respect the position we 
are. Or we know the relative orientation between some offices (in that building) and 
we want to know the orientation of every office with respect the rest of them. In that 
case, we need to know the height of every office, that is, we need to represent and 
reason with a 3-dimension orientation model. 

There exist mainly three qualitative models for orientation which are not based on 
projections into external Reference Systems (RS): Freksa and Zimmermann's model 
[2, 3, 4, 5]; Hernandez's approach [7, 8]; and Frank's approach [6]. In these models, 
space is divided into qualitative regions by means of RSs, which are centred on the 
reference objects. Spatial objects are always simplified to points, which are the 
representational primitives. From the three models not based on projections, the 
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Zimmerman and Freksa's model is considered more cognitive because no extrinsic 
reference system is necessary. This model has been chosen for extending to 3-D. 

We are going to distinguish two parts in the reasoning process: the Basic Step of 
the Inference Process (BSIP) and the Full Inference Process (FIP). The BSIP can be 
defined in general terms such as: given a spatial relationship between point c with 
respect to (wrt) a RS, and another spatial relationship between point cf wrt another RS, 
point c being part of that RS, the BSIP consists of obtaining the spatial relationship of 
point d wrt the first RS. When more relationships among several spatial landmarks are 
provided, then the FIP is necessary. It consists of repeating the BSIP as many times as 
possible, until no more information can be inferred. 

To accomplish the integration of orientation, distance and cardinal directions into 
the same spatial model we will use the following three steps: 

(1) the representation of the spatial aspect to be integrated; 

(2) the definition of the BSIP for each represented spatial aspect; and 

(3) the definition of the FIP for this spatial aspect. 

In this paper, we are going to focus on those three parts. 



2 The 2-D Zimmerman and Freksa Orientation Representation 

In [2, 3, 4,5] approach, the orientation RS is defined by a point and a director vector 
ab, which defines the left/right dichotomy. The RS also includes the perpendicular 
line by the point b, which defines the first front/back dichotomy, and it can be seen as 
the straight line that joins our shoulders. This RS divides space into 9 qualitative 
regions. A finer distinction could be made in the back regions by drawing the 
perpendicular line by the point a. In this case, the space is divided into 15 qualitative 
regions. The point a defines the second front/back dichotomy of the RS. 

The information represented is where the point c is with respect to the RS ab, that is, c 
wrt ab. This information can also be expressed as a result of applying the following 
operations: Identity, Inverse, Homing, Homing Inverse, Shortcut and ShortcutInverse. 



3 The 3-D Orientation Model 

3.1 Representation 

We consider a plane that contain the two points which defines de 2-D RS, a and b. 

We must decide on the plane before choosing the points [12]. In that case, our 3-D 
orientation RS will be based on a point and a reference plane. The reference plane 
chosen will be a plane parallel to the floor. In the case we do not have any specific 
plane to make reference, we must decide it first. When we said a reference plane we 
refer to all the family of planes parallel to the reference plane. 

We consider two heights more that define the upper and downer height, 
respectively. Therefore, the 2-D orientation model has been extended to the third 
dimension, as it is shown in figure 1. 
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Considering these three heights we had extended the 15 qualitative regions in 45 
qualitative regions. 



right 



Straight 

a) 



front 

b-orthogonal 

neutral 

— ^ a-orthogonal 

^ back 



b) 



S^Q9‘e 









down 

c) 







Fig. 1. a) left, straight and right; b) front, 
b-orthogonal, neutral, a-orthogonal and 
back; c) up, same and down 



Fig. 2. A single cell divided into five 
heights and the names of every part; and 
the representation of the different heights. 



Having these three heights (the up height, the same height and the down height 
with respect to the plane passes by the point a) implies that a and b have always to be 
in the same plane. In most of the cases this did not happen. As well as into the 2-D RS 
we can define a fine 3-D orientation RS by including five different heights: up, b- 
height, between, a-height and down. (fig. 3 a). The 3-D orientation RS divides space 
into 75 qualitative regions. A 3-D iconic representation of the RS is shown in fig 3 b). 

The names of every region are defined according to the position they are. We will 
use acronyms asulf if it position is in up-left-front; usf for up-straight-front; urf 
for up-right-front, and so on (figure 4). 





Fig. 3. a) the 3-D orientation RS, b) and Fig. 4. the names inside of the 3-D iconic 

the corresponding 3-D iconic representation. 

representation. 

As a matter of clarity, the 3-D representation has been translated into 2-D iconic 
representation, as it is shown in figure 2. By agreement we will reason with the point 
b above the point a. For the cases in which the second point of the front/back 
dichotomy is not in the same plane or above the first point of the front/back 
dichotomy, we rotate the RS 180 degrees by using the spin operation. 

The operations to be represented this 3-D orientation RS (c wrt ab) are: Identity, 
Spin, Inverse, Homing, Shortcut and its inverses. These will be defined in the algebra. 



3.2 Algebra 

The algebra consists on seven operations. (In [11,12] are represented the operations, 
its algebraic notation and its iconic representation). 
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The operations have been implemented as facts in a PROLOG database. In order to 
deal with the disjunction of relationships, the result of applying some operation to any 
orientation relationship is a list of relationships. Often this list contains only a relation 
(as in Identity, Inverse or Spin operations), but it allows us to deal with more than one 
relation if necessary (as in Homing, Homing Inverse, Shortcut and Shortcut Inverse). 

• Identity (id): The PROLOG facts of the Identity operation would be for 
instance: id (ulf , [ulf ] ) ; id (usf , [usf ] ) ; etc. 

• Inversion $nv): The PROLOG facts of the Inversion operation would be for 
instance: inv(ulf, [drb] ) ; inv (usf , [dsb]);etc. 

• Spin ^p): The PROLOG facts of the Spin operation would be for instance: 
sp (ulf , [drf ] ) ; sp (usf , [dsf ] ) ; etc. 

• Homing (im): The PROLOG facts of the Homing operation would be for 
instance: hm(ulf, [dlb] ); hm(usf, [dsb] ); etc. Here disjunction appear, for 
example: hm(us, [dlf , dsf, drf, dl , ds, dr, din, dsn, drn, 
dla, dsa, dra, dlb, dsb, drb]). 

When we complete the HM operation of the 3-D orientation relationship left-front-b- 
height (first row, second column), it happens that c and b are in the same plane. 
Therefore, we reduce the five heights to three (a-height, between and b-height are the 
same plane). In this case, the result of the HM operation is a disjunction because we 
have considered the two cases in which we applied or not the spin operation. 

• Homing Inverse (hmi): The PROLOG facts of the Homing Inverse operation 
would be for instance: hmi (ulf, [ulf]); hmi (usf, [usf]); etc. Also 
disjunction appear here, for example: hmi (us, [ulf, usf, urf, ul, us, 
ur, uln, usn, urn, ula, usa, ura, ulb, usb, urb] ) . 

• Shortcut (sc): The PROLOG facts of the Shortcut operation would be for 
instance: sc (ulf , [brn] ) ; sc (usf , [bsn] ) ; etc. 

• Shortcut Inverse (sci): The PROLOG facts of the Shortcut Inverse operation 
would be for instance: sci (ulf , [brn] ); sci (usf , [bsn] ); etc. 



3.3 Algebraic Combinations of Operations 

There is a strong inner resemblance between the homing and the shortcut operations, 
for which only one table [12] is necessary, because all the results found in the homing 
3-D iconic representation are found in the shortcut 3-D iconic representation. 

We notice the resulting operation of combining two operations is other operation. 
E.g. INV(SC(x)) = SCI(x); and SC(INV(x)) = HM(x). 

An important feature of these operations is their idempotent property [1]. The 
inverse, homing inverse and shortcut operations are idempotent of level two (i. e. 
INV(INV(c wrt ab)) = c wrt ab) and the homing and shortcut inverse operations are 
idempotent of level three, (i. e. HM(HM(HM(c wrt ab)))= c wrtab). 

Note that the application of the homing operation twice is equivalent to the 
application of the shortcut inverse operation once (i.e. SCI(HM(c wrt ab))= c wrt ab), 
and the application of the shortcut inverse operation twice is equivalent to apply the 
homing operation once (i.e. HM(SCI(c wrt ab))= c wrtab). 
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4 The Basic Step of the Inference Process 



The Basic Step of the Inference Process (BSIP) used in Freksa and Zimmermann's 
qualitative orientation approach is defined such as (figure 5): "given two relationships 
c wrt ab and d wrt be, we want to obtain the relationship d wrt ab" . 




Fig. 5. The inference process. 



The inference process among the coarse qualitative orientation relationships has 
been represented as an inference table of 75 x 75 entries using our approach [11]. The 
first column of the table shows the relationship c wrt ab and the first row of the table 
depicts the relationship d wrt be. The relationship obtained in the composition table is 
d wrt ab. The result of the inference is always one of the seventy-five relationships or 
a disjunction of them. 



4.1 The Inference Table 

In order to visualise the reasoning procedure by means of tables, only a subset of the 
neighbourhood relationships is reflected. The arrangement of relationships in rows 
and columns of the fine 75 x 75 table are obtained following this idea (shown in [11]). 

The PROLOG facts of the Inference Table would be for instance: 
inf_table (ulf , ulf, [ulf, ul, uln, ula, ulb] ) ; 
inf_table (usf , usf, [usf]);etc. 

5 The Full Inference Process 

The Full Inference Process (FIP) is the other part in the reasoning process. When 
more relationships among several spatial landmarks are provided, then the FIP is 
necessary. Here a Constraint Solver (CS) for qualitative orientation will be explained. 
This CS, which implements a path consistency algorithm, is based on Constraint 
Logic Programming (CLP) extended with Constraint Handling Rules (CHRs). 

5.1 Qualitative Orientation as a Constraint Satisfaction Problem 

Our qualitative orientation model implies three spatial objects (a, b and c), therefore 
the constraints which deal with this information are tertiary. The Constrain 
Satisfaction Problem (CSP) is reformulated for these tertiary constraints (c wrt ab) 
such that: given a set of variables (Xi,...,Xn}, a discrete and finite domain for each 
variable (Di,...,Dn], and a set of constraints (Cc,ab (Xc,Xa,Xb)}, which define the 
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relationship between every group of three variables (Xc,Xa,Xb), (l<a<b<c<n); the 
problem is to find an assignment of values (Vc,Va,Vb), Vi £ Dj to variables such that all 
constraints are satisfied, i.e. Cc,ab (Xc,Xa,Xb) is true for every a,b,c (l<a<b<c<n). 

We redefine a network of tertiary constraints as path consistent for triples of nodes 
(c,a,b) and all paths a-b-ii-...-in_i-i„ between them, if the direct constraint Cc,ab is tighter 
(has less disjunction) than the indirect constraint along the path, i.e. the composition 
of constraints Cu ab ® ® along the path. Until a fixed point is reached, to 

determine whether a graph is complete we repeat the following operation: 

^d.ab •“ ^d,ab ® ^c.ab ® ^d,bc 



5.2 The Path Consistency Algorithm for Qualitative Orientation 

The following constraint satisfaction algorithm for complete disjunctive tertiary 
constraints networks is defined using PROLOG extended with CHRs. PROLOG 
provides backtracking and CHRs are used to implement path consistency at a high 
level of abstraction. 

The constraint Cc^ab is represented in the algorithm by the predicate 
cr_or (C, A, B, Rel) , where Rel is the list of primitive spatial orientation 
relationships forming the disjunctive constraint. The operation of a path consistency, 
is implemented by means of two kinds of CHRs. The part of the operation 
corresponding to the intersection c^^, © ... is implemented by simplification CHRs: 
cr_or (C,A,B,R1) , cr_or (C, A, B, R2) <=> 

inter (R1,R2,R3) | cr_or (C, A, B, R3) 

The part corresponding to the Cc^ ® Cd,bc is implemented by propagation CHRs: 
cr_or (C,A,B,R1) , cr_or (D, B, C, R2) ==> 

compo (R1,R2,R3) | newc (D,A,B,R3) 

Termination is guaranteed because the simplification rule replaces R1 and R2 by 
the result R3 of intersecting R1 with R2 (and R3 is the same as R1 or R2 or smaller) 
and because propagation CHRs are never repeated for the same constraint goals as it 
will be shown. 

The algorithm is based on the algorithm developed in [1]. The optimisation 
introduced in the algorithm of [10] (named PC-2) has also been included. This 
optimisation is based on the idea that the constraint c^ ab can be computed as the 
converse Cc ab if it is needed (by applying the inverse operation to the corresponding 
relationship), which saves half of the computation. 

Here disjunction could also appear in first and second argument; in those cases 
operations of union, intersection and composition of disjunctive ternary constraints 
must be used. 

The operations of union, intersection and composition are formally redefined for 
disjunctive ternary constraints in this section. (All these operations are associative). 
The union of disjunctive ternary constraints can be formulated as follows: 

Cc.ab uc’c_ab •'= c(ri,..., rjob V c{si,..„ Smjab = c{ri,..., rj uc{si,..„ Smfab 
The intersection is defined as follows: 

Cc.ab Cl c’c.ab ■'= cfrj,..., r„}ab a c{si,..., sjab = c/ry,..., rj n c(s],..., sjab 
The composition of disjunctive ternary constraints is defined as follows: 

Cc.ab ® c’d^ah •'= c{ri,...,rJabAd{si,...,Smjbc = d{r® sir e{ri,..., r„j, s s{si,..., Smjjab 
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Where ® is the basic step of inference process . 

A disjunctive ternary constraint Cc,ab between the variables a,b and c, also written 
c{r],..., r„}ab, is a disjunction (c rj abOv ...v (c r„ ab) where each p is a relation that 
is applicable to c and ab. 

5.2.1 The Algorithm 

Here a part of the path consistency algorithm to propagate compositions of disjunctive 
qualitative orientation relationships appear. 

% Constraint declarations and definitions 

constraints ( cr_or) /4 , ( cr_or) / 6 . 

label_with cr_or (N, C, A,B,Rel, I) if N>1. 

cr_or(N,C,A,B,Rel,I) member (R,Rel) , cr_or(l,C,A,B, [R] ,1) . 

% Initialise 

cr_or (C,A,B,R) <=>length(R,N) | cr_or (N, C, A, B, R, 1) . 

% Special cases 

cr_or(N,C,A,B,R,I) <=> empty(R) | false. 
cr_or (N, C,A,A,R, I) <=> true. 

cr_or (N, C, C, B, R, I ) <=> contains_equality_a (R) | true. 

cr_or (N, C, A, C, R, I ) <=> contains_equality_b (R) | true. 

cr_or(N,C,A,B,R,I) <=> N=75 | true. 

% Intersection 

cr_or(Nl,C,A,B,Rl,I) , cr_or (N2 , C, A, B, R2 , J) <=> inter (R1 , R2 , R3 ) , 
length (R3 , N3 ) , K is min(I,J) | cr_or (N3,C,A,B,R3,K) . 
cr_or(Nl,B,C,A,Rl,I) , cr_or (N2 , C, A, B, R2 , J) <=> hm_op (R1,R11) , 
inter (R11,R2,R3) (length (R3,N3) ,K is min(I,J) |cr_or(N3,C,A,B,R3,K) . 

cr_or(Nl,C,A,B,Rl,I) , cr_or (N2 , B, C, A, R2 , J) <=> hm_op (R2 , R22) , 
inter (R1,R22,R3) (length (R3,N3) ,K is min(I( J) | cr_or (N3(C(A(B(R3(K) . . . 

% Composition 

cr_or (N1(C(A(B(R1( I) ( cr_or (N2 (D( B( C( R2 ( J) ==> I=l( 
comp(Rl(R2(R3) ( length (R3 ( N3 ) ( K is I+J | cr_or (N3 (D( A( B( R3 ( K) . 
cr_or (N1(B(C(A(R1(I) ( cr_or (N2 (D(B(C(R2(J)==>I=1( singleton (R1 ) ( hm_op ( 
Rl(Rll) (Conp(Rll(R2(R3) (length (R3(N3) (K is I+j| cr_or (N3 (D( A(B(R3 ( K) . 

cr_or ( N1 ( C ( A( B ( R1 ( I ) ( cr_or (N2 (C(D(B(R2(J)==>I=1( singleton (R2 ) ( hm_op ( 
R2(R22) (Comp(Rl(R22(R3) ( length (R3(N3) (K is I+j| cr_or (N3(D(A(B(R3(K) . . . 

Predicates in % Constraint declarations and definitions, initialize and in 
% Special cases are introduced at the beginning of our database. 

Simplification CHRs (rules in % Intersection) and propagation CHRs (rules in % 
Composition) perform intersections (which permit the simplification of redundant 
information) and compositions. The first rule implements (intersection or composition 
respectively) in the way such as it is originally defined in the rules in point 5.2. 

By applying the five operations (HM, SC, INV, HMI and SCI) to the first 
constraint of the two which are in the head of the original rule (point 5.2), it is 
possible to obtain the orientation information among the same three spatial objects. 
(Only hm_op rule is exposed in this algorithm) Therefore, it is possible to calculate 
results if the corresponding operations are applied to the relationship or disjunction of 
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relationships in the guard part of the rules. The application of the above operations to 
a disjunction of relationships is equivalent to the application of these operations to 
each relationship included in the disjunction of relations. 

Notice that the operations that are idempotent of level three, have a different 
treatment to the rest of operations. If the HM operation is applied to the usual 
definition of the orientation ternary constraint (c wrt ab), the order of the variables in 
the constraint becomes {a wrt be). However, if the HM operation is applied again to 
the result (to a wrt be), instead of obtaining the original result, (which is the one 
expected when the operations are idempotent of level two), the relationship ib wrt ea) 
is achieved. 

Second Simplification CHR corresponds to the above explanation third 
simplification CHR corresponds to the case in which the operations are applied to the 
second constraint of the two which appear in the head of the original intersection rule 
(the first rule in point 5.2). A total of 1 1 CHRs to compute results would be defined. 

6 Conclusions and Future Works 

We have left out of this paper some future work: the integration of different levels 
of granularity and the application of the 3-D orientation model to mobile robots with 
an arm manipulator on it. 
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Abstract. Equilibrium logic is an approach to nonmonotonic reasoning that gen- 
eralises the stable model and answer set semantics for logic programs. We present 
a method to implement equilibrium logic and, as a special case, stable models for 
logic programs with nested expressions, based on polynomial reductions to quan- 
tified Boolean formulas (QBFs). Since there now exist efficient QBF-solvers, this 
reduction technique yields a practically relevant approach to rapid prototyping. 
The reductions for logic programs with nested expressions generalise previous re- 
sults presented for other types of logic programs. We use these reductions to derive 
complexity results for the systems in question. In particular, we show that deciding 
whether a program with nested expressions has a stable model is E 2 -complete. 



1 Introduction 

Equilibrium logic, developed in [22], is an approach to nonmonotonic reasoning that 
generalises the stable model and answer set semantics for logic programs. It is based 
on a nonclassical logic known as here-and-there, which is intermediate between clas- 
sical logic and intuitionistic logic, and it provides a means to extend the reasoning 
mechanism associated with answer sets beyond the syntactic limitations of normal and 
disjunctive logic programs. Here we discuss a method to implement equilihrium logic 
and, as a special case, stable models for logic programs with nested expressions [19], 
based on polynomial reductions to quantified Boolean formulas (QBFs). A similar re- 
duction technique was previously applied to other nonmonotonic reasoning formalisms, 
see, e.g., [7,5,3], This in turn can be seen as a natural generalisation of a similar method 
successfully applied to NP-problems (see, e.g., [14]). Since there now exist efficient 
QBF-solvers [2,9,11,16,27], this reduction technique yields a practically relevant ap- 
proach to rapid prototyping. 

The reductions for logic programs with nested expressions generalise results pre- 
sented for other types of logic programs in [7]. Although Fifschitz, Tang and Turner 
already provided a translation of nested expressions into an extension of standard logic 
programming [19], their translation is in general not polynomial because it is based 
on distributivity laws. In particular, this means that we cannot straightforwardly derive 

* This work was partially supported by the Austrian Science Fund Project under grant P15068. 
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from it complexity results on reasoning with nested expressions. Here, however, we 
shall use our reduction techniques to discuss complexity issues as well as some other 
consequences of our results; in addition we deal briefly with some matters regarding 
implementation. 

The remainder of the paper is laid out as follows. First we review the logical frame- 
work of quantified Boolean formulas. We then turn in Section 3 to equilibrium logic 
which is defined as a particular form of minimal-model reasoning in the logic of here- 
and-there. We show how to re-express satisfiability in here-and-there in the setting of 
classical logic and from there we proceed to characterise the central concept of equilib- 
rium models using QBFs. Section 4 is devoted to logic programs with nested expressions 
as presented in [19]. Already [22] showed how stable models for normal and disjunctive 
programs can be regarded as equilibrium models. Later in [18] this result was extended 
to programs with nested expressions. From the main result of Section 3 it then follows 
that the property of being a stable model of this kind of program can also be expressed 
by QBFs. We show how to formulate this reduction in such a way that it leads to a linear 
translation of programs into QBFs. 

Section 5 looks at the complexity of various reasoning tasks involving the logic 
of here-and-there and logic programs. In particular it is shown how the well-known 
property that deciding whether a disjunctive program has a stable model is S 2 -complete 
extends to the case of nested expressions. Another result obtained here concerns the 
strong equivalence of logic programs, studied, e.g., in [18]. It was shown there that two 
programs are strongly equivalent if and only if they are logically equivalent in the logic 
of here-and-there. Our reduction techniques can be used to show that deciding strong 
equivalence of logic programs has coNP complexity. 

In Section 6 we look briefly at an architecture for implementing the reductions and 
in Section 7 we draw conclusions and consider related and future work. The longer proof 
of Section 3 (Lemma 2) is relegated to an appendix. 



2 Preliminaries 



We deal with propositional languages and use the logical symbols T, _L, V, A, and 
^ to construct formulas in the standard way. We write C-p to denote a language over an 
alphabet V of propositional variables or atoms. Formulas are denoted by Greek lower- 
case letters (possibly with subscripts). As usual, a literal is either an atom p (a positive 
literal) or a negated atom (a negative literal). Furthermore, the logical complexity, 
lc{(f>), of a formula (j) is the number of occurrences of the logical symbols V, A, and 
^ in (j). 

Given an alphabet V, we define a disjoint alphabet V' as V' = {p' \ p € V}. Then, 
for a G Cp, we define a' as the result of replacing in a each atom p from V by the 
corresponding atom p' in V' (so implicitly there is an isomorphism between V and V'). 
This is defined analogously for sets of formulas. 

Quantified Boolean formulas (QBFs) generalise ordinary propositional formulas 
by the admission of quantifications over propositional variables (QBFs are denoted by 
Greek upper-case letters). Informally, a QBF of the form Vp 3q means that for all truth 
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assignments of p there is a truth assignment of q such that is true. For instance, it is 
easily seen that the QBF 3pi 3p2 {{pi ^ P 2 ) A Vpsips ^ P 2 )) evaluates to true. 

The precise semantical meaning of QBFs is defined as follows. First, some ancillary 
notation. An occurrence of a variable w in a QBF <P is free iff it does not appear in 
the scope of a quantiher Qv (Q G {V, 3}), otherwise the occurrence of v is bound. 
If contains no free variables, then is closed, otherwise is open. Furthermore, 
, Vnlfn] denotes the result of uniformly substituting the free occurrences 
of variables Vi in ^ by (1 < z < n). By a {classical) interpretation, I, we understand 
a set of variables. Informally, a variable v is true under / iff z; G /. In general, the truth 
value, vi{<P), of a QBF under an interpretation I is recursively dehned as follows: 

1. if^> = T, thenzz/(<I>) = 1; 

2. if ^ = _L, then vi{<l>) = 0; 

3. if ^ = z; is an atom, then vi{<P) = 1 if z; G /, and = 0 otherwise; 

4. if ^ then vi{^) = 1 — vi{^)', 

5. if ^ = {<Pi A tp 2 )> then zz/(^) = min{{h'i{<Pi),i'i{<p 2 )})', 

6. if ^ = (^1 V <^ 2 ), then zz/(^) = zz/(^ 2 )}); 

7. if ^ = (^1 — > <^ 2 ), then ui{S) = 1 iff vi{hi) < vi{d> 2 )', 

8. if ^ = Vz! S', then zz/(^) = zz/(if'[z;/T] A >f"[z;/_L]); 

9. if ^ = 3z! S', then zz/(^) = zz/(if'[z;/T] V >f"[z;/_L]). 

We say that is true under I iff vi{d>) = 1, otherwise is false under I. If U]{<P) = 1, 
then / is a model of Likewise, for a set S of formulas, if vi{<f) = 1 for all (p G S, 
then / is a model of S. If has some model, then <P is said to be satisfiable. If is 
true under any interpretation, then tP is valid. Observe that a closed QBF is either valid 
or unsatisfiable, because closed QBFs are either true under each interpretation or false 
under each interpretation. Hence, for closed QBFs, there is no need to refer to particular 
interpretations. 

We note the following elementary property, which will be relevant later on. Let (p 
be a QBF whose free variables are given by V, and let W C V. Then, / is a model of 
3W<P iff there is some J <ZW such that J U / is a model of <P. 

In the sequel, we employ the following abbreviations in the context of QBFs: Let 
S = {fi, . . . , 4>n} and T = {fi, . . . , fn] be indexed sets of formulas. Then, S < T 
abbreviates Ar=i('^* ^ Ai)> and S' < T is a shorthand for {S < T) A ^{T < S). 
Furthermore, for a set P = {pi , . . . ,p„} of propositional variables and a quantiher 
Q G {V, 3}, we let QP ‘P stand for the formula QpiQp 2 ■ ■ ■ Qpn Finally, hnite sets 
T = {fi, . . . , 4>n} of formulas are usually identihed with the conjunction A”=i 4>i of 
their elements. 

The operators < and < are fundamental tools for expressing certain tests on sets 
of atoms. In particular, the following properties hold: Let V = {z>i, . . . , z;„} and W = 
{zni, . . . ,zz>„} be two sets of indexed atoms, and let Ii C V and /2 C PF be two 
interpretations. Then, (i) /i U is a model ofV< W iff 1\ C I 2 , and (ii) I\ U is a 
model ofy < IF' iff/i C I 2 . 

3 Equilibrium Logic 

We start our discussion with equilibrium logic, an approach to nonmonotonic reasoning 
that generalises the answer set semantics [10] for logic programs. Equilibrium logic was 
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introduced in [22] and further investigated in [23] ; proof theoretic studies of the logic can 
be found in [25,24]. The main difference here is that we do not consider languages with 
a second kind of negation, strong negation, needed for capturing answer set semantics 
for logic programs with two kinds of negation. 

The main result of this section is to show how equilibrium logic can be mapped to 
quantified Boolean formulas in polynomial time. This translation will be constructed in 
two steps: First, since equilibrium logic is defined as a special form of minimal-model 
reasoning in the (nonclassical) logic of here-and-there, we translate the logic of here- 
and-there into classical logic. Afterwards, we use this translation to encode equilibrium 
logic in terms of QBFs.' 

Generally speaking, the logic of here-and-there is an important tool for analysing 
various properties of logic programs. For instance, as shown in [18], the problem of 
checking whether two logic programs are strongly equivalent can be expressed in terms 
of the logic of here-and-there. 

In what follows, we use the propositional language C-p defined in Section 2, where 
V is some alphabet. 

The semantics of the logic of here-and-there is defined in terms of two worlds, H 
and T, called “here” and “there”. It is assumed that there is a total order, <, defined 
between these worlds such that < is reflexive and H < T. As in ordinary Kripke 
semantics for intuitionistic logic, we can imagine that in each world a set of atoms is 
verified and fhat, once verified “here”, an atom remains verified “fhere”. Formally, by 
an HT-interpretation, I, we understand an ordered pair {Ih, It) of sets of atoms such 
that Ih It- Then, the truth value, tt{w, (f>), of a formula ^ at a world w G {H, T} in 
an HT-interpretation I = (Ih, It) is recursively defined as follows: 

1. if (/) = T, then v-x(w, (f>) = 1; 

2. if (j) = _L, then vt(w, 4>) = 0; 

3. if (/) = u is an atom, then vx(w, </>) = 1 if u G Iw, and vx(w, 4>) = 0 otherwise; 

4. if (j) = then vx(w, (p) = lif for every world u such that w < u, ux{u, tp) = 0, 
and vx(w, ^) = 0 otherwise; 

5. if(j)= (pi A p 2 ), then i>x{w, p) = min{{vx{w, pi),vx(w, P 2 )})', 

6. ifp= (pi V P 2 ), then i>x{w, p) = max({vx{w, pi),vx(w, P 2 )})', 

7. if p = (pi P 2 ), then vx(w,p) = 1 if for every world u such that w < u, 
vx(u, pi) < vx(u, P 2 ), and I'xi'w, p) = 0 otherwise. 

We say that p is true under I in w iff vx(w, p) = 1, otherwise p is false under 

X in w. An HT-interpretation X = (Ih, It) satisfies p, or X is an HT-model of p, iff 

vx(H, p) = l.\f p is true under any HT-interpretation, then p is HT-valid. An HT- 
interpretation (Ih, It) is total if Ih = It- 

It is easily seen that any HT-valid formula is valid in classical logic, but the converse 
does not always hold. For instance, pV and ~^^p p are valid in classical logic but 
not in the logic of here-and-there, because X = (0, {p}) is not an HT-model for either 
of these formulas. 

* The logic of here-and-there, whose name is motivated below, is also commonly known as 
Godel’s 3-valued logic in view of Gddel’s paper [12]. It was first presented in the form of truth 
matrices by Heyting in [13] and first axiomatised by Lukasiewicz in [21]. 
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Equilibrium logic can be seen as a particular type of reasoning with minimal HT- 
models. Formally, an equilibrium model of a formula ^ is a total HT-interpretation (/, I) 
such that (i) [I, I) is an HT-model of (p, and (ii) for every proper subset J of I, {J, I) is 
not an HT-model of (p. 

We proceed with the following two results, which we use frequently later on: 

Proposition 1. For any HT- interpretation X = {Ih,It) and any propositional formula 
(j), the following relations hold: 

1. ui{T,f) = iiffuiAf) = i; 

2. ux{H, (f>) = 1 implies vx{T, (jf) = 1. 

Observe that the first part of this proposition states that f is true under X = {Ih, It) 
in the world T iff <p is true under It in classical logic. 

Proposition 2. A total HT-interpretation (/, I) is an HT-model of f iff I is a model of 
(j) in classical logic. 

Next, we express satisfiability in the logic of here-and-there in terms of satisfiability 
in classical logic. We start with the following translation: 

Definition 1. Let f be a formula. Then, T[<f>\ is recursively defined as follows: 

1. if (j) is an atom, or one of T or _L, then r[0] = f; 

2. if(p= {4>i o(p 2 ), for o G {A, V}, thenT[f] = t[4>i] o T[(j) 2 ]; 

3. if4> = -'f, then = ^T[fi\ A —'f; 

4. iff = {fi (P 2 ), then t[4>] = (r[^i] ^ r[</) 2 ]) A f' 2 ). 

First of all, let us note that this transformation is quadratic in the size of the input 
formula. More precisely, we have the following property: 

Lemma 1. Let t[-] be the transformation defined above. Then, 

Ic{t[ 4>]) < lc{4>){lc{4>) + 2 ), for any formula f. 

Proof. By induction on Zc(</>). □ 

Intuitively, the primed formulas in T\f] correspond to formulas evaluated in the 
world “there”, whilst unprimed formulas correspond to formulas evaluated in “here”. In 
order to fully express the semantics of the logic of here-and-there, the only additional 
requirement needed is to ensure that all formulas true in “here” are also true in “there”. 
This can be conveniently expressed in terms of the condition V <V , where V is the 
set of atoms occurring in f. More formally, we have the following relation: 

Lemma 2. Let f be a formula on atoms V and let Ih ■, It XV be interpretations. Then, 
{Ih, It) is an HT-model off iff Ih U If is a model of 

rHT[f] = {V< V') A T[f]. 



Proof. See Appendix A. 



□ 




Encodings for Equilibrium Logic and Logic Programs with Nested Expressions 311 

Equilibrium models are then expressed as follows: 

Theorem 1. Let (j) be a formula on atoms V. Then, {I, I) is an equilibrium model off 
iff I' is a model of 

Tsif] = f A ^3V{(V < V) A T[f]). 

Proof. We first note that Te [f] is obviously equivalent to 

f' A ->3V{{V < V') A TetM), (1) 

because (V < V) is logically equivalent to (V < V) A (lA < V'). 

Let / C fA be some interpretation. Recall that (/, I) is an equilibrium model of f iff 

(i) (/, I) is an HT-model of f, and (ii) for every J C I, {J, I) is not an HT-model of f. 
We show that (i) and (ii) hold iff /' is a model of (1). 

First of all, by Proposition 2 and a simple renaming, it follows that Condition (i) 
holds iff (/)' is true under Now consider the QBE If' = 3V{{V < V) A 
the properties of < and the semantics of the existential quantifier, we have that S' is true 
under I' iff there is some J C / such that T[{tW\ is true under J U I' . Hence, invoking 
Lemma 2, we get that W is true under I' iff Condition (ii) does not hold. Consequently, 

(ii) holds iff the second conjunct of (1) is true under Therefore, (i) and (ii) are jointly 

satisfied iff /' is a model of (1). □ 

Observe that, in virtue of Lemma 1 , both encodings Tht [•] and Te [•] are computable 
in polynomial time with respect to the size of the input formula. 

To illustrate the mechanism of both transformations, consider the formula f = 
p. We already pointed out that f is not valid in the logic of here-and-there, 
although it is clearly a tautology of classical propositional logic. Let us first construct 
r [(/)], which is given by 

{t[^^p\ t[p\) a ^ P')- 

Applying the definition of t[-] recursively, we get 

A ^p) A ^^p'^ ^ p A {-^^p p). (2) 

The first conjunct of (2) is equivalent to p' p, and the second conjunct is a 
tautology. Hence, THrif] = {p p') A T[f] is equivalent to 

{p p') A {p' p), 

which has two models, viz. Mi = {} and M 2 = {p,p'}. Therefore, by Lemma 2, the 
HT-models of ~^^p p are given by (0, 0) and ({p}, {p}). Observe that (0, {p}) is 
not an HT-model of <j>, which is in accordance with our discussion above. 

Concerning the transformation Te[^], let us compute the equilibrium models of 
^ P by means of Theorem 1. Since is equivalent to p' ^ p, we get that 
Te \f] is equivalent to 

^ P') A -~3p(^{p p') A ^(p' ^ p) A (p' ^ p)Y 



(3) 




312 



David Pearce, Hans Tompits, and Stefan Woltran 



Observe that ^{p' ^ p) A {p' p) makes the whole formula in the scope of 3p 
unsatisfiable, so (3) is equivalent to {^^p' p'), which is a tautology of classical 

logic. Thus, the models of Te[</>] are given hy 0 and p' , and Theorem 1 implies that the 
equilibrium models of {^^p' p') are given by (0, 0) and ({p}, {p})- 

4 Logic Programs 

In this section we discuss logic programs with nested expressions, following Lifschitz 
et al. [19].^ These kinds of programs generalise normal logic programs by allowing 
bodies and heads of rules to contain arbitrary Boolean formulas. We show that logic 
programs with nested expressions can be encoded in terms of QBFs in linear time, 
exploiting the reduction of equilibrium logic discussed in the previous section. Based on 
the resulting transformations, in Section 5 we derive several complexity results related 
to logic programs with nested expressions. 

We start with some basic notation. A formula of C-p whose sentential connectives 
comprise only A , V , or ^ is called an expression. A rule, r, is an ordered pair of the 
form 

H{r) ^ B{r), 

where B{r) and H{r) are expressions. We call B{r) the body of r and H{r) the head 
of r. A program, II, is a finite set of rules. 

We employ for rules and programs the same notational convention concerning prim- 
ing as we did for formulas, i.e., r' is the result of replacing each atom u in r hy v' and, 
similarly, 77' is the result of replacing each r G II hy r' . 

Note that programs properly generalise disjunctive logic programs [10], which are 
characterised hy the condition that bodies of rules are conjunctions of literals, and heads 
are disjunctions of atoms. 

In what follows, we associate to each rule r a corresponding formula r = B{r) 
H{r) and, accordingly, to each program 77 a corresponding set of formulas 77 = {f | 
r G 77}. 

We call expressions, rules, and programs basic iff they do not contain the operator 
An interpretation 7 is a model of a basic program 77 if it is a model of the associated set 
77 of formulas. 

Given an interpretation 7 and an (arbitrary) program 77, the reduct, 77^, of 77 with 
respect to 7 is the basic program obtained from 77 by replacing every occurrence of an 
expression in 77 which is not in the scope of any other negation by _L if ^ is true 
under 7, and by T otherwise. 7 is a stable model of 77 iff it is a minimal model (with 
respect to set inclusion) of the reduct 77^. 

A formula 4> is said to be a brave consequence of a logic program 77 iff there is a 
stable model I of II such that (f> is true under 7, and ^ is a skeptical consequence of 77 
iff (j) is true under all stable models 7 of 77. 

Two logic programs, 77i and 772, are equivalent iff they possess the same stable 
models. Following Lifschitz et al. [18], we call 77i and II 2 strongly equivalent iff, for 

^ Here we consider languages with only one kind of negation, however, corresponding to default 
negation. 
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every program 77, 7Ti U 77 and 772 U 77 are equivalent. The following property was 
shown in [18]; 

Proposition 3. Let 77i and II 2 be programs, and let Lit = {B{r) H{r) \ r G 77^}, 
for 7 = 1,2. Then, 77i and II 2 are strongly equivalent iff Hi and II 2 are equivalent in 
the logic of here-and-there. 

Strong equivalence is a useful concept to simplify parts of programs. In Section 5, 
we analyse the computational cost of deciding whether two programs are strongly equiv- 
alent. 

The following result establishes the close connection between equilibrium models 
and stable models, showing that stable models are actually a special case of equilibrium 
models: 

Proposition 4 ([18]). For any program 77, 7 is a stable model of 77 iff (7, 7) is an 
equilibrium model of U. 

In concluding our review of logic programs, let us note that in [ 19] a transformation is 
presented which maps logic programs with nested expressions into equivalent disjunctive 
logic programs.^ However, this translation is in general not polynomial because it relies 
on distributive laws which yield an exponential explosion in some instances. Indeed, 
this is, e.g., always the case whenever the head of a rule is an expression in disjunctive 
normal form, or the body is an expression in conjunctive normal form. In this section, on 
the other hand, we shall construct a translation of programs into QBFs which is linear 
in the size of the input program. 

The first step towards this translation is the following QBF encoding of logic pro- 
grams, which is an immediate consequence of Theorem 1 and Proposition 4: 

Theorem 2. Let LI be a logic program, LI = {77(r) — > 77(r) | r G 77} the set of 
formulas associated with 77, and V the set of atoms occurring in 77. Then, I C V is a 
stable model of II iff I' is a model of 

TeITI] = 77' a ^3V{(y < V) A r[77]). 

As we know from Section 3, this encoding is quadratic in the size of the input 
program. In order to get a linear transformation, we consider the following adaption of 
transformation t[-]: 

Definition 2. Let f be an expression. Then, t* [f] is recursively defined as follows: 

1. if (f) is an atom, or one of T or _L, then r* [0] = f; 

2. iff = (01 o 4 ) 2 ), for o e { A , V }, then r*[0] = r*[0i] o r*[02],' 

3. iff = -10, then T*[f] = =0'. 

Obviously (c(r*[0]) = lc{f), for any expression f. Observe that the difference 
between T*[f] and r[0] lies in the treatment of negation: in T*[f] the recursion comes 
to a halt if 0 is a negated formula. Intuitively, this follows from an application of Part 2 
of Proposition 1 for negated formulas. 

^ Observe that [19] allows literals to occur in heads as well as in bodies. 
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Lemma 3. Let <j) be an expression and {Ih, It) an HT-interpretation. Then, I = 
is a model o/r [(/)] iff it is a model ofr* [4>]. 

Proof. By induction on lc{(f>). 

INDUCTION BASE. Assume lc{4>) = 0. Then, = T*[(j)] and the statement holds 
trivially. 

INDUCTION STEP. Assume lc{4>) > 0, and let the statement hold for all expressions 
Ip such that lc{tp) < lc{(p). Suppose (p = {(pi o (p 2 ), for o g { A , V }. Then, T[(p] = 
T[(pi\ o T[(p 2 ] and T*[(p] = T*[(pi\ o T*[(p 2 \. Since lc{(pi) < lc{(p), for i = 1,2, by 
induction hypothesis we get that and t* \(pi], as well as T[(p 2 ] and r* \(p 2 \, have the 
same models. Consequently, T[(p] and r* \(p] possess identical models. 

Now assume that <p = -^ip, for some expression Then, T*[<p] = -^ip' . Since 
t[(P] = —'Tpip] A T*\(p], each model of T\(p] is also a model of r* [(/>]. So suppose that 
I = Ih LI If is a model of r* [</>] = ^ip' , where I = {Ih, It) is an HT-interpretation. 
Now, / is a model of -^tp' iff i'i^{-'tp') = 1. Hence, renaming yields vir^{-^ff) = 1. By 
Part 1 of Proposition 1, we get that v-x{T, ip) = 0, which in turns implies vx{H, ip) = 0, 
in virtue of Part 2 of Proposition 1 . Hence, I is an HT-model of -'ip. Applying Lemma 2, 
we conclude that Ih U is a model of r[^ip] . □ 

Given this result, the optimised QBF encoding for logic programs is as follows: 

Theorem 3. Let U, II, and V be as in Theorem 2. Then, I CV is a stable model of II 
iff I' is a model of 

Ts[n] = n' A <V') A /\ (T*[B{r)] ^ r*[iT(r)])). 

ren 

Proof. We show that Ts [II] is equivalent to 

TeIII] =fl' a -'3V {{V < V) A r[77]). 

First of all, by definition of r[-], we get 

'^[II] = f\ {T[B{r)] T[H{r)]) A fl' . 

relj 

Now consider the formula {V < V) A T[fl\. Since {V < V) = {V < V) A 
-^{V < U), any model of (U < V') A T[fl] is also a model of (U < U') . Furthermore, 
recall that for any interpretation Ji U J '2 with Ji QV {i = 1, 2), we have that (Ji, J 2 ) 
is an HT-interpretation iff Ji U J 2 is a model of U <V. Hence, by applying Lemma 3, 
it follows that (V < V) A r[77] is equivalent to 

(U < U') A A {^*[B{r)] - t*[H{t)]) A fl' . 

relj 

We obtain that is equivalent to 

n' A < U') A A i'^*[B{r)] ^ r*[iF(r)]) A 7T'). (4) 

relj 
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Clearly, the occurrence of IJ' in the scope of can be moved outside the quantifier, 
which gets then absorbed by the first conjunct of (4). The result of this manipulation is 
Ts[n\. □ 

The above transformation also allows us to express the basic consequence relations 
associated with logic programs in terms of QBFs, as stated next: 



Corollary 1. Let II be a logic program on atoms V, let cj) be a formula, and let U be 
the set of atoms occurring in f but not in II. Then, 

L (j) is a brave consequence of II iff 3V' {Ts[II] A WU'f') is true; and 
2. (j) is a skeptical consequence of II iff VlA'(7s[77] ^ WU'f') is true. 



Theorems 2 and 3 generalise a similar QBF encoding given in [7] for the case of 
disjunctive logic programs. There, the following QBF was used to evaluate a given 
disjunctive logic program 77 : 



ToLpim = 77 A -3C'[(C' <V) A /\ (s+(r') A B~{r)) ^ 77(r')) 



ren 



In this formula, 77+ (r) denotes the conjunction of all positive literals in 77(r) , and B~{r) 
denotes the conjunction of all negative literals in B{r). 

It is easily seen that, given a disjunctive logic program 77, the transformation Tg [77] 
coincides with [77], providing the priming of formulas is interchanged. 



5 Complexity 

In this section, we analyse the complexity of several decision problems associated with 
the logic of here-and-there and logic programs, respectively. Besides the reasoning tasks 
underlying the QBF encodings from the previous sections, we also deal with the com- 
plexity of determining whether two logic programs are strongly equivalent. 

For each of the tasks discussed in the following, upper complexity bounds can be 
obtained by the quantifier structure of the corresponding QBF encoding. In fact, the 
basis for this is the following well-known result. 

Proposition 5. Let f be propositional formula whose atoms are partitioned into i>l 
sets V\,. . . ,Vi, and let <1 = QiVi . . . QiVi f be a QBF in prenex form, where Qj G 
{3,V} and Qk f Qfc-i-i, for 1 < j < i and \ < k < i. Then, deciding whether I> 
evaluates to true is 

(a) Uf -complete if Qi =3, and 

(b) nf -complete if Qi = V. 

Recall that Sf and Ilf are the constituting members of the polynomial hierarchy, 
where Sf = Bf = P, Sf = NP, and 77f = coNP. 

We first deal with the satisfiability problem of the logic of here-and-there. 

Theorem 4. Deciding whether a propositional formula has an HT-model is HP -com- 
plete. 
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Proof. Membership follows from the polynomial-time constructible reduction Tht [•] 
into classical propositional logic. Hardness can be shown hy reducing the satisfiahil- 
ity problem for a classical propositional formula in conjunctive normal form into the 
satisfiahility problem in the logic of here-and-there. This can he done as follows. 

Let (/) be a formula in conjunctive normal form, let V = {vi, . . . , u„} be the set of 
atoms occurring in (j), and let W = {zui, . . . , w„} he a set of new atoms. Then, (p is 
satisfiahle in classical propositional logic iff 

n 

V Wi) A {^Vi V A p, 

has an HT-model, where p results from (p hy replacing each negative literal ^Vi in p 
hy Wi- □ 

Next, we turn our attention to the basic reasoning tasks associated with logic pro- 
grams. The following proposition was shown in [8]: 

Proposition 6. Deciding whether a disjunctive logic program has at least one stable 
model is S 2 -complete. 

We extend this result as follows: 

Theorem 5. Deciding whether a logic program {containing nested expressions) has at 
least one stable model is S 2 -complete. 

Proof. Theorem 3 implies that a program 77 has a stable model iff 3V'Ts [77] evaluates 
to true, where V is the set of atoms occurring in 77. Hence, membership follows immedi- 
ately from Proposition 5 by observing that 3V'7s [77] can be transformed in polynomial 
time into an equivalent QBF being in prenex form 3 VL 1 VIT 2 P- Furthermore, hardness 
is a direct consequence of Proposition 6. □ 

This theorem shows that logic programs can be faithfully reduced to disjunctive logic 
programs in polynomial time. Note that the reduction presented in [19] does not meet 
this property. 

The following result can be shown similarly to Theorem 5 (by invoking Corollary 1). 
Theorem 6. VFe have the following complexity results: 

1. Deciding whether a formula p is a brave consequence of a logic program 77 is 
S 2 -complete. 

2. Deciding whether p is a skeptical consequence of II is II 2 -complete. 

Concerning full equilibrium logic, the following result was shown in [25]: 

Proposition 7. Deciding whether a formula has at least one equilibrium model is S 2 - 
hard. 

Using our QBF-encoding for equilibrium logic (Theorem 1), we get a matching 
upper bound as follows: 
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Theorem 7. Deciding whether a formula has at least one equilibrium model is - 
complete. 

Finally, we deal with the complexity of deciding whether two logic programs are 
strongly equivalent. 

The following lemma is a direct consequence of Proposition 3 and Lemma 2. 

Lemma 4. Programs Ui and II 2 are strongly equivalent iff (ff[jT[ni] 

is a tautology of classical logic, where IJi = {B{r) H{r) \ r G Ui} (i = 1 , 2 ). 

Theorem 8. Deciding whether two logic programs are strongly equivalent is in coNP. 



6 Implementation 

Our methodology for considering encodings of different reasoning tasks into quantified 
Boolean formulas is motivated by the availability of several practically efficient QBF- 
solvers. Among the different tools, there is a propositional theorem-prover, boole, 
based on binary decision diagrams (the system can be downloaded from the Web at 
http://www.cs.cmu.edu/~modelcheck/bdd.html), a system using a generalized resolu- 
tion principle [15], several provers implementing an extended Davis-Putnam proce- 
dure [2,1 1,16,9,27], as well as a distributed algorithm running on a PC-cluster [9]. With 
the exception of boole, these tools do not accept arbitrary QBFs, but require the input 
formula to be in prenex conjunctive normal form. To avoid an exponential increase of 
formula size, structure preserving normal form translations [4,26] can be used to trans- 
late a general QBF into the required normal form. In contrast to the usual normal form 
translation based on distributivity laws, structure preserving normal form translations 
introduce new labels for subformula occurrences and are polynomial in the length of the 
input formula. 

The translations discussed in Sections 3 and 4 have been implemented as a special 
module of the reasoning system [7, 6, 5, 3], which is a prototype tool for solving 

several nonmonotonic reasoning tasks based on reductions to QBFs. 

The general architecture of is depicted in Figure 1. consists of three 

parts, namely the filter program, a QBF-evaluator, and the interpreter int. The input 
filter translates the given problem description (like, e.g., a logic program and a specified 
reasoning task) into the corresponding quantified Boolean formula, which is then sent 
to the QBF-evaluator. The current version of provides interfaces to most of the 
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sequential QBF-solvers mentioned above. For the solvers requiring prenex normal form, 
the QBFs are translated into structure preserving normal form. The result of the QBF- 
evaluator is interpreted by int. Depending on the capabilities of the employed QBF- 
evaluator, int provides an explanation in terms of the underlying problem instance (e.g., 
listing all stable models of a given logic program). This task relies on a protocol mapping 
of internal variables of the generated QBF into concepts of the problem description which 
is provided by filter. 

Finally, we note that, due to Theorems 4 and 8, testing the satisfiability of formulas in 
the logic of here-and-there, or testing the equivalence of two logic programs, can also be 
implemented using satisfiability checkers like SATZ [17], SATO [28] or RELSAT [1]. 



7 Concluding Remarks 

We have shown how to encode reasoning problems in the nonmonotonic system of equi- 
librium logic by means of quantified Boolean formulas and fhereby how to implement 
equilibrium logic using QBF solvers. Since this logic generalises the semantics of stable 
models for logic programs with nested expressions, our QBF reduction applies a fortiori 
here as well, yielding a polynomial translation. The earlier QBF reduction of ordinary 
(disjunctive) logic programs, given in [7], can be obtained as a special case. 

Analysing these reduction methods yields complexity results for equilibrium logic, 
extending those obtained in [25]. New complexity results for nested expressions, ex- 
tending those of [8] for disjunctive logic programs, have also been obtained. As a further 
corollary we have also derived complexity results for deciding whether two logic pro- 
grams are strongly equivalent. Notice that, since the Lloyd-Topor-Semantics [20] for 
logic programs can be expressed in terms of logic programs with nested expressions, as 
shown by [19], our encoding is also applicable to this formalism. 

In future work we hope to study further refinements of the basic encoding, in par- 
ticular to show how programs with nested expressions can be efficiently reduced to 
equivalent disjunctive programs (in a suitably extended sense). Another topic is how to 
deal with a second (strong) negation operator, present in equilibrium logic and in the 
programs of [19]. Lastly, it is hoped to study examples of practical reasoning problems 
using an implementation of the formalisms studied here in the system. 



A Proof of Lemma 2 

The proof proceeds by induction on the logical complexity lc{(f>) of <f>. First, observe 
that, for IhJt C V, the pair {Ijj, It) is an HT-interpretation iff In U If is a model of 
V < V. 

INDUCTION BASE. Assume lc{(j)) = 0. Then, f is either an atom, or one of T or _L. If 
(j) is one of T or _L, then the statement holds trivially. So, suppose that f = v, for some 
atom V. Then, V = {u} and t[v] = v. Hence, 



ThtWi = (v 



v') A V. 
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Assume that X = {Jh-, It) is an HT-model of (f), i.e., we have that Ih C and 
V G Ih- From this, we immediately get v G It, and therefore v' G I^- So, Ih XI I^ = 
{v,v'}, which is clearly a model of Tht[<I>]- Conversely, if Ih U is a model of 
Tht[<P] = {v ^ v') A V, for some Ih Q V and I^ C V' , then v G Ih- Given that 
Ih U is a model of {v v'), it follows that X = {Ih, It) is an HT-interpretation, 
so, in virtue of u G ///, X is an HT-model of (j). 

INDUCTION STEP. Assume lc{<j)) > 0, and let the statement hold for all formulas ip such 
that lc{ip) < lc{4>)- We have to consider several cases, depending on the structure of 
(p. Due to space restrictions, we only show two cases; the other ones follow by similar 
arguments. 

Assume that (p = {pi V P 2 )- Then, Tht[P] is given by 

(U < V) A t[Pi V P 2 ], (5) 

where r[pi V P 2 ] = t[Pi] V r[(/) 2 ]- By taking Vi as the set of variables occurring in pi 
(i = 1, 2), it follows that (5) is classically equivalent to 

(U < V) A (((Ui < Vi) A t[Pi\) V {{V 2 < Vi) A t[P 2 ])), 

which represents {V < V) A {Xht[Pi\ V Xht[P 2 ])- 

Suppose now thatX = {Ih, It) is an HT-model ofp = {pi W P 2 )- We get thatX is 
an HT-model of pi or of p 2 - Without loss of generality, we assume thatX is an HT-model 
of pi. Since lc{pi) < lc{p), by induction hypothesis it follows that Ih U is a model 
ofXHTiPi]- Hence, Ih U ip is also a model oIThtIPi] V X'ht[P 2 \ - Furthermore, since 
X is an HT-interpretation, Ih U is a model of U < V'- It follows that Ih U is a 
model of (V < V) A {Xht[Pi\ V Xht\P 2 ])- Since the last formula is equivalent to 
Tht[P], we obtain that Ih U is a model of Xht\P]- The converse direction follows 
in essentially the same way. 

Now assume that p = -•p. By definition of r[-], 

Tht[P] = {V < V') A ~^t[P] a -'P' . 

Suppose that X = {Ih, It) is an HT-model of p = -'p. Then, P) = 0, for 
each u G {H,T}. Since lc{p) < lc{p), and given that vt{H,P) = 0, the induction 
hypothesis implies that Ih U ip is not a model of Xht[P] = {V < V) A t[P]- But 
Ih X ip is a model of {V < V) (since X is an HT-interpretation), so Ih U ip is not a 
model of t\P]- It follows that Ih X ip is a model of ^r[p]. On the other hand, given 
that i'j{T,p) = 0, Part 1 of Proposition 1 implies that i 2 j.^{p) = 0. Hence, a simple 
renaming yields that Ih U is a model of -ip' . Finally, Ih X ip is a model of U <V 
because X is an HT-interpretation. We conclude that Ih U is a model of Tht [P] - The 
proof of the converse direction proceeds analogously. 
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Abstract. In [10] it was shown that it is possible to describe the set of 
normal inhabitants of a given type r, in the standard simple type system, 
using an infinitary extension of the concept of context-free grammar, 
which allows for an inhnite number of non-terminal symbols as well as 
production rules. The set of normal inhabitants of r corresponds then to 
the set of terms generated by this, possibly infinitary, grammar plus all 
terms obtained from those by 77 -reduction. In this paper we show that 
the set of normal inhabitants of a type r can in fact be described using 
a standard (hnite) context-free grammar, and more interestingly that 
normal inhabitants of types with the same structure are described by 
identical context-free grammars, up to renaming of symbols. 



1 Introduction 

The relation of terms and types in the system TAa of simply typed A-calculus 
has been a major subject of study over the last decades. The interest in this 
area is due to its importance to areas of mathematical logic and more recently 
to computer science and artificial intelligence. In fact, systems of lambda calculus 
are of importance for most knowledge representation theories and in particular 
for several systems for natural language processing. On the other hand, it is well 
known that there exists a direct correspondence (cf. [8]), via Curry-Howard iso- 
morphism, between TAv and the implicational fragment of intuitionistic propo- 
sitional logic, Vi^)- As such, a type t can be assigned to some A-term if and 
only if T is a provable formula/theorem of ’P(— >). Furthermore, every (normal) 
term to which t can be assigned, i.e. every (normal) inhabitant of r, represents 
a (normal) proof of r in the natural deduction system for the implicational frag- 
ment of intuitionistic logic. Thus, the study of inhabitation in TA;, corresponds 
directly to the study of provability in The decision problem for inhabita- 

tion as well as the infiniteness problem of the set of normal inhabitants have been 
shown to be polynomial-space complete, respectively by Statman in [9] and by 
Hirokawa in [7] . In [2] , Ben-Yelles defined an algorithm, also described in [6] , to 
count and list the set Nhabs(T) of normal inhabitants of a type r. An adaptation 
of his algorithm for the subsystem XI and similar algorithms for principal nor- 
mal inhabitants in both system were presented in [3]. Other algorithms, which 

^ i.e. M is an inhabitant of r 
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for every type produce a normal inhabitant or guarantee the non-existence of 
inhabitants, were given in [5]. 

In [10] it was shown that it is possible to describe the set of normal inhabitants 
of a type, using an infinitary extension of the concept of context-free grammar, 
which allows for an infinite number of non-terminal symbols as well as production 
rules. The set of normal inhabitants of r corresponds then to the set of terms 
generated by this, possibly infinitary, grammar plus all terms obtained from those 
by 77-reduction. In this paper we show that the set of normal inhabitants of a type 
T can in fact be described using a standard (finite) context-free grammar, and 
more interestingly that types with the same structure share the same grammar, 
up to renaming of symbols. This result partly explains an observation made by 
Bindley in [6] , where he pointed out that “when we ask for the number of normal 
inhabitants of a type, the answer is often finite and interesting patterns show up 
which are still not completely understood.” On the other hand, it provides us 
with a new angle to take into account in the study of the relation between the 
structure of formulas and their proofs. The definition of a common grammar- 
scheme for a given type structure, is based on an alternative representation for 
types, introduced in [4], which gives us a better insight on the nature of a type’s 
structure and its relation to the structure of the set of its normal inhabitants. 

2 Background 

We assume familiarity with the basic notions in A-calculus and use standard 
notation from [1] and [6]. Our notation differs from that in [1], since we denote 
type-variables (atoms) by “A,B,C,. . .’’and arbitrary types by lower-case Greek 
letters. This choice results from the standard notation for context-free grammars 
and from the fact that we will use an annotated version of the atoms in a type as 
non-terminal symbols of the context-free grammar which describes the normal 
inhabitants of this type. For type assignment we consider the system TA^ of 
simply typed A-calculus a la Curry (for an introduction see [6] or [1]). 

In this section we recall the definition of an alternative tree-like representation 
for types, called formula trees, first introduced in [4], which gives us an exact 
idea on the different stages during the construction of normal proofs/inhabitants 
of a formula/type. In fact, the formula tree of a type r defines some kind of 
hierarchy over the primitive parts of r which can be used to construct proofs of 
T represented by proof trees. 

2.1 Formula Trees and Proof Trees 

Formula trees are trees whose nodes consist of primitive parts which are of either 
one of the following forms (PI), (P2) or (P3): 

(PI): ^ (P2): / \ (n>l) (P3): ^ 

A-l • • • An 

Definition 1. A tree with primitive parts as nodes is a formula tree iff 
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— the root of the tree is of form (PI); 

— every internal node of the tree is of form (P2); 

— every leaf of the tree is of form (P2) or (P3). 

The following algorithm computes the formula tree tree((^) of a type (p. We 
use dashed lines for the edges of the formula tree in order to distinguish them 
from the edges in the primitive parts (nodes) of the tree. Note that every type 
ip can be written uniquely in the form p = a\ ^ ^ an ^ A, where A is an 

atom and n > 0. The algorithm is given by the following. 

— If n = 0, i.e. p = A, then tree((/?) = . 



— If n > 1, then tree((/?) 



where t(A) 



A 



A 

/ \ 

/ \ 

/ \ 

t(ai) t(a„) 



and for fc > I and mi, . . . , mu > 0 we define t((an 
. . . ^ (Ofcl ^ ^ Ctknik ^ ^ Aj = 




t(oii) * • ■ t.(^ainii') ' * ' 



Note that one can easily define an inverse algorithm, which given a formula 
tree FT computes the unique type p such that tree((^) = FT. 

Example 1. The formula {{A ^ B) ^ A ^ B) ^ {A ^ B) ^ A ^ B has the 
following formula tree 
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with primitive parts 



Po = 



B 



Pi 



B 



/\ 

B A 



B 

P2 = I 
A 



A A 

P3 = I P4= I ■ 



Definition 2. A proof tree built from a formula tree tree(i^) is obtained by 
joining primitive parts oftree(ip) identifying /overlapping (different) occurrences 
of the same type variable. We call it a valid proof tree if 

— all leafs in the proof tree are of form (PS); 

— whenever a primitive part pi occurs beneath some primitive part pj in the 
formula tree tree(</?), then above every occurrence of pi in the proof tree 
there is at least one occurrence ofpj. 

Note that the formula tree of ip defines some kind 
primitive parts, which has to be respected by every valid 
tree((/?). In particular, this means that the root of every 
from tree((/j) is the root of tree((/?). 

Example 2. A valid proof tree for the formula in example 

I 

Pii B 

P 2 which looks like | 

Pz A 

I 

On the other hand, 

Po 
P2 
Pi 

is no valid proof tree (though it would have the same appearance), since it does 
not respect the hierarchy given by the formula tree 

Po 
/ I \ 

/ I \ 

Pi P2 P3 

/ 

/ 

Pi 

oi {{A ^ B) ^ A ^ B) ^ {A ^ B) ^ A ^ B, which requires that pi should 
only be used beneath occurrences of both po and pi . Other valid proof trees are 

Po Po 

and both with appearance 

P2 Pz P2 Pz 

Pi Pz 



B 

/\ 

B A . 



of hierarchy over its 
proof tree built from 
valid proof tree built 

1 is 



A 




A Context-Free Grammar Representation 



325 



2.2 Proof Trees and (Long) Normal Inhabitants 

A /3-normal inhabitant M of a type ip is called a long normal inhabitant of 
(f iff every variable-occurrence 2: in M is followed by the longest sequence of 
arguments allowed by its type, i.e. iff each component with form {zP\ . . -Pn), 
{n > 0) that is not in a function position has atomic type. The (finite) set of 
all terms obtained by 77-reducing a A-term M is called the ry-family of M and 
denoted by {M}rj. It has been shown (cf. [2], [6]) that the ?7-families of the 
long normal inhabitants of p partition the set of normal inhabitants of p into 
non-overlapping finite subsets, each ?7-family containing just one long member. 
Furthermore, Ben-Yelles (cf. [2], [6]) showed that every normal inhabitant of 
a type p can be /^-expanded to one unique (up to a-conversion) long normal 
inhabitant of p. A simple expansion-algorithm can be found in [6]. Thus, when 
seeking for normal inhabitants of a type it is sufficient to compute the set of its 
long normal inhabitants from which all normal inhabitants can be obtained by 
?7-reduction. 

In [4] an algorithm was defined which given a long inhabitant M of a type p 
computes a valid proof tree PT(M) of tree((/?). 

Proposition 1 (in [4]). If M is a long normal inhabitant of a type p, then 
PT(M) is a valid proof tree built from tree((/?). 

Also, an inverse algorithm was given in [4], which given the formula tree of 
a type p, t = tree((/j), and any valid proof tree PT built from it, computes a set 
Terms(PT) of long closed normal inhabitants of p. 

Proposition 2 (in [4]). Let PT be a valid proof tree built from the formula 
tree of some type p. Then every member of Terms(PT) is a closed long normal 
inhabitant of p. Furthermore, the two algorithms are complementary in the sense 
that for every closed long normal inhabitant M of p there is M G Terms(PT(M)). 

To sum up, every normal inhabitant of a type p corresponds to one valid proof 
tree built from tree(i^), every valid proof tree built from tree((^) corresponds to 
a finite set of normal inhabitants of p, and distinct valid proof trees correspond 
to distinct and disjoint sets of normal inhabitants. 

3 A Context-Free Grammar for Nhabs(r) 

3.1 Using Term-Schemes Instead of Proof Trees 

Up to now we saw that the set of normal inhabitants of a type r can be computed 
by generating all proof trees of r, applying the algorithm Terms and collecting 
all terms which can be obtained from these by ry-reduction. In the following we 
will introduce the notion of term-scheme for A-terms, in such a way that for each 
proof tree PT there is a term-scheme T such that Terms(PT) = C{T), where C{T) 
is the set of A-terms represented by T. In a term-scheme T different variables of 
the same type are represented with the same name, and the corresponding set of 
A-terms C(T) can be obtained from T by instantiating them to different names, 
while respecting scoping rules. 
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Example 3. The formula {{A ^ A) ^ A) ^ A with formula tree 



A 

I 

I 

A 

I 

A 

I 

I 

A 



Po 

I 

I 

i.e. Pi 
I 
I 

P 2 



with primitive parts ~ A' 

following valid proof trees PTi, PT2, PT3, . . . 



A 

I 

A 



and p2 



A 



has the 



AAA 

Po Po Po III 

Pi Pi Pi A , A , A , . . . 

P2 , Pi ,Pi ,■■■ i-e. I I I 

P 2 Pi A A 

P 2 I I 

A 



For these proof trees the algorithm Terms produces respectively the following 
sets of long normal inhabitants: 

Terms(PTi) = { Axi.Xi(Aa:2.a;2) }, 

Terms(PT2) = { Axi.Xi(Aa:2.a;i(Aa;2.a;2))) Xxi.xi{\x2-xi{\x'2-x'2)) }, 
Terms(PT2) = { \xi.Xi{Xx2-Xi{\x'2-Xi{\x2-X2))), 
\xi.Xi{Xx 2 -Xi{\x' 2 -Xi{\X 2 -X' 2 ))), 
Xxi.Xi{Xx2-Xi{Xx'2-Xi{Xx2-X2))) } 

The previous example illustrates the fact that, although a proof tree may in 
fact represent more than one, but a finite number, of long normal inhabitants 
of a type, all those inhabitants follow a common pattern which will be captured 
by the notion of term-scheme. As such, for every valid proof tree PT for a type 
T, there will be a term-scheme representing exactly the terms computed by 
Terms(PT). 

Definition 3. Syntactically term-schemes are X-terms. The (finite) set of X- 
terms represented by a term-scheme T with variables xi,...,Xn is defined by 
C{T) = as follows. 



_ _ rTjrTl .2 _ ki...ki...kn —Irp. V T-" A'~^\ it- > 1 • 

'^^X\...Xi...Xn X'^^S ’ ^^Xi...Xi...Xn ^ ^ j 

^ This case corresponds to the renaming of free occurrences of variables in T. 
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- = 



I ^ Qki...kn T a T^i---kn\. 

I *->2 tz ^xi...Xn^ -^3 ^ 

= {Ax^.T, |T, G 



Example 4- The scheme T = \xy.x{\xy.xy){\y.xy) represents the following set 
of A-terms. 

C{T) = { \xy.x{\x'y' -xy){\y' -xy), 

\xy.x{\x'y' ,xy){\y' -xy'), 

\xy.x{\x'y' -xy'){\y' .xy), 

\xy.x{\x'y' ,xy'){\y' .xy'), 

Xxy.x{Xx'y' ,x'y){Xy' .xy), 

Xxy.x{Xx'y' .x'y){Xy' .xy'), 

Xxy.x{Xx'y' .x'y'){Xy' .xy), 

Xxy.x{Xx'y' .x'y'){Xy' .xy') } 



Example 5. The sets of long normal inhabitants corresponding to the proof trees 
in example 3 can respectively be represented by the following term-schemes: 

Terms(PTi) = C(Axi.a:i(Ax2.a;2)), Terms(PT2) = C{Xxi.Xi{Xx2.X2)), 
Terms(PT3) = C(Axi.a;i(Aa;2-a;i(Aa:2-a:2))), • • • 



3.2 The Context-Free Grammar 0,- 

In this subsection we will, for every type t, define a context-free grammar Gt- 
which generates a set of term-schemes, that correspond exactly to the long nor- 
mal inhabitants of r. 

Definition 4. Let t he a type, with atoms Ai, . . . , Am and such that tree(r) has 
primitive parts po, . . . ,Pn, where po represents the root of tree(r). Let G(r) = 
(T, N, R, S) be the context-free grammar with 

— set of terminal symbols T = {(, ), A, ., xi, . . . , x„}, 

— set of non-terminal symbols N = {S'} U {Af \1 <i <m,P € 

~ start symbol S 

and such that R is the smallest set satisfying the following conditions. 
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— whenever a non-terminal symbol appears on the right side of a production 
rule in R, then for every i G P such that pi is of the form 



^3 



A 



31 



A. 



and such that in tree(r) each Aj^ has descendents Pji, . . . ,PjH, there is a 
rule in R of the form 






Xi{\x 



3i 



■ ■ ■ -^31^ ■ ■ ■ 



■ . X-t. 






where Pi = PU {jf , . . . , jf}, for 1 < I < s. 



Example 6. For a as in example 1, there is Gq, = (T, N, R, S) with 



- T = {{,),\,-,Xl,X2,Xz,Xi), 

- IV = {^} U I P C {1, 2, 3, 4}} U {P^ I P C {1, 2, 3, 4}}, 

- and production rules 



S XxiX2X^. 

pl23 ^:ri(Ax4. pl234)(^123) 
pi23 

^ Xi{\Xi. pl234)(^1234) 
51234 ^ ^1234) 

^1234 ^ 3^3 
411234 ^ 



This grammar generates the following infinite set of term-schemes. 

£(Gq.) = { \x1X2X3.x2Xz, 

XxiX2Xz-Xx{\Xi.X2X‘i)xz, 

XxiX2X^.Xi{XXi.X2X4)x^, 

XxiX2X^.Xi{XXi.Xi{Xx4^.X2X^)x2)Xi, 

XxiX2Xz-Xi{Xxi.Xi{Xxi.X2xd)XA)x^, 

XxiX2Xz-X\{Xxi.Xi{Xxi.X2X4)xz)x^, 

XxiX2X3.Xi(Xx4.Xi(Xx4.X2X4)x4)x3, . . . } 
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Thus, 

Nhabs(a) = { M \ 3T C L 3N G C(T) such that N M} 

= { \XiX2X2,.X2X^, 

\X\X2-X2^ 

\xiX2Xz-Xi{\Xi.X2Xz)x3, 

\XiX2X^.Xi{\Xi.X2X4)x^, 

\xiX2-Xi{\X4^.X2X4) , 

\x\X2Xz-XiX2Xz^ 

\xiX2-XxX2, 

Ax 1 X2 X3 . a; 1 ( Ax 4 .a; 1 ( Acc4 . CC2 X3 ) X3 ) X3 , 

AxiX 2 X 3 .Xi(Ax 4 .Xi(Ax 4 .X 2 X 3 )x 4 )x 3 , 
AxiX 2 X 3 .Xi(xi(Ax 4 .X 2 X 3 ))x 3 , 
AxiX2X3.Xi(Ax4.Xi(Ax4.X2X4)x3)x3, 
AxiX 2 X 3 .Xi(Ax 4 .Xi(Ax 4 .X 2 X 4 )x 3 )x 3 , 
AxiX2X3.Xi(Ax4.XiX2X3)x3, . . . } 



3.3 Correctness of G^- 

In the following we describe an algorithm which, given a type t and a valid proof 
tree PT built from tree(T), computes a term-scheme T = TS(PT) representing 
the set of long normal inhabitants that corresponds to PT. 

Consider the root of PT and let pi, . . . ,p„, with n > 0, be the descendents of 

the corresponding occurrence of A in tree(r). 

I .... ^ 

— If PT is of the form A , then take the corresponding primitive part pi = , 

I ' 

in tree(r) and let TS(PT) = Axi . . . Xxn-Xi. 

— Otherwise, PT is of the form 




PTi • • • PT™ 



with m > 1, and such that for j = 1, . . . , m the root of PTj is 
the corresponding primitive part pi of the form 



' . Consider 
^3 



A 



/ 






\ 



B 



m 



TS(PT) = Axi . . . XXn-XiMi . . . Mjn, 



and let 
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where for 1 < j < to, 



M, e TS(PT,). 



The following result establishes the correctness of the algorithm TS. 

Proposition 3. Let t he a type and PT a valid proof tree built from tree(r). 
Then, modulo a- conversions, 

C(TS(PT)) = Terms(PT). 

Proof The result should become quite clear after recalling the definition of the 
algorithm Terms in [4]. Given the formula tree of a type cp, t = tree((p), and 
any valid proof tree PT built from it, Terms(PT) = Terms(t, 0, PT, 0) is defined 
as follows. In the following let L represent a set assigning variable names to 
primitive parts of t. Then, Terms(t, fc, PT, L) is defined by the following. 

Let be the root of PT and consider the corresponding occurrence of A in t with 

(direct) descendents pi, . . . ,Pn, n>0. Now, let L' = L\J {{pi, x^) 1 1 < z < n}. 

I .... ^ 

— If PT is of the form A , then take the corresponding primitive part p = , 

I ' 

and let Terms(t, k, PT, L) = {Xx\ . . . Xx^.x \ (p, x) G L'}. 

— Otherwise, PT is of the form 




PTi • • • PT™ 



with TO > 1, and such that for i = 1, . . . , to the root of PT^ is 
the corresponding primitive part p of the form 



' . Consider 



A 

/ \ 

B\ ■ ■ ■ Bm 

and let Terms(t, k, PT, L) = {Xx\ . . . Xx^.xMi . . . Mm \ (p, x) G L', 

Mj G Terms(t, k + 1, PTj, L'), 1 < j < to} 

Now note that, when erasing all superscripts of variables in the members of 
Terms(PT), one obtains exactly one term, which is TS(PT). On the other hand, 
the computation done by C is performed during the construction of Terms (PT): in 
fact, if an abstraction over a variable x occurs in the range of another abstraction 
over X, then the two occurrences of x receive different superscripts, the available 
superscripts for x are stored in the set L and whenever x is used, all possibilities 
(i.e. all possible superscripts) are explored. • 
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Proposition 4. For any type r the grammar Gt- describes Nhabs(r) in the sense 
that 



Nhabs(r) = {M | 3T € >C(Gt-) 3N € C(T) such that N M}. 

Proof Note, that the definition of G,- is essentially a description of the possibili- 
ties of constructing valid proof trees from tree(r). As such, the production rules 
for a non-terminal symbol describe the possible ways of overlapping an 

occurrence of A with another occurrence of A in a primitive part pi of tree(r). 
For this, pi has to be available, corresponding to the condition i €. {i \, ..., in} and 
A has to be the root or pi. Then, instead of generating valid proof trees, these 
are simultaneously translated to the corresponding term scheme they represent. 
• 



4 A Common Grammar Scheme for Types with the Same 
Structure 



Given a type r we represent the corresponding type-structure replacing occur- 
rences of atoms by blank successively indexed boxes as illustrated in the following 
example. 

Example 1. The type-structure corresponding to a from example 1 is 

e = ((Di ^ Da) ^ Da ^ ^4) ^ {O 5 ^ De) ^ Dy ^ Ds 

with formula tree 



/□s 
/ I \ 

N 

I 



□4 

/\ 



I 



I \ 
□ r 






□7 



□1 



Possible instances, i.e types with this structure, are for example, 

1. for = ^2 = Da = Ds = Dy = A and ^4 = Dg = Ds = S the type 

a^ = A) ^ B) ^ {A^ B) ^ B 

2. for Di = ^2 = A, Da = Dy = B and ^4 = Dg = Dg = □§ = C the type 

a2 = {{A^A)^B^C)^{C^C)^B^C 
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3. or for Di = ^2 = Da = ^4 = Ds = De = ^ and U^j = B the type 

= A) A) A)-. A 

In the following we define an algorithm which given a type-structure 0 com- 
putes a grammar(-scheme) Ggi such that for every instance a with structure O a 
context-free grammar equivalent to G„ can be obtained from G© by substitution 
of all occurrences of indexed boxes in this grammar by the corresponding atoms 
in a. 



Definition 5. Let 0 he a type-structure, with indexed boxes and with 

tail i.e. □„ is the box in the root o/tree(6>). Let G(6>) = (T, N, R, S) be the 
context-free grammar with 

— set of terminal symbols T = {(, ), A, Xi, . . . , x„}, 

— set of non-terminal symbols N = {S'} U {tufll < i < n,P G 2^^’ 

— start symbol S 

and such that R is the smallest set satisfying the following conditions. 



— if the root has as descendents primitive parts, whose roots are respectively 

^ii ) • • • ) in tree(0), then R has exactly one production rule for S which 
is S ^ Axii . . 

— whenever a non-terminal symbol appears on the right side of a production 
rule in R, then for every i G P such that there is a ( unique ) primitive part 



□ 



in tree(0) of the form ^ there is a rule in R of the form Xi 

— whenever a non-terminal symbol appears on the right side of a production 
rule in R, then for every i G P such that Pi is of the form 












\ 






and such that in tree(0) each 0^, has as descendents primitive parts respec- 
tively with roots □ ji , . . . , , there is a rule in R of the form 

□f ^ x,{\Xii . . . x.ti . . . (Ax,i . . . 

where Pi = P LI {ij,. , i*f}, for 1 < I < s. 



Example 8. Let 0 be as in example 7. Then, G© = (T, N, R, S) with 

- T= |(,),A,.,Xi,...,X8}, 

- lV = {S}U{nf |1 <i<8,Pg2{1’ ’8>} 

- start symbol S 
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and R given by 



s 


Xx 4 XQXr . 


□ ^467 _ 


Xi 


□4457 _ 


a;4(Axi. 


n 1467 
'-'4 


X4(XX4. □2^^®^)( Da^^®^) 


□ g467 _ 




n 1467 
'-'6 


- ^6( as'"®") 


□ ^467 _ 




n 1467 

'-'7 


X7 



Naturally, for every type-scheme O the context-free grammar Ggi generates 
the empty language, since no non-terminal symbol on the right side of production 
rules will appear on the left side of any rule in R. But, the following proposition 
shows that G© can in fact be seen at as a scheme representing grammars for any 
type with structure 0. 

Proposition 5. Let t he a type with structure 0, i.e. r = 

6 *[Ai/Di, . . . , A„/D„], /or, not necessarily distinct, atoms Ai, An- Let 0^-/0 
be the context-free grammar obtained from G© by substituting all occurrences of 
□ i, . . . , respectively hy A\, Then (modulo a- conversion), 

Nhabs(r) = {M \ 3T G £(Gt) 3N G C(T) such that N M} 

= {M I 3T G £(Gt/©) 37V G C(T) such that N AI). 

Proof First notice that we probably use some distinct kind of enumeration for 
G,- and G© which leads to distinct indexes of variables and superscripts of non- 
terminal symbols. So let us consider G) as the grammar G,- where every index and 
superscript i, referring to the primitive part pi in tree(r) is replaced by the index 
of the box in the root of this primitive part pi. Since this attribution of new values 
to indexes is obviously injective, it is sufficient to verify that £{G() — £(G©). For 
this, note first that every production rule of G) is a production rule of G^/©. On 
the other hand, all other rules in G^/© will never be used during the generation 
of a term-scheme. • 

Example 9. For the types a\, «2 and «3 in example 7 we obtain the following 
results. 

1. Gc,j/© is obtained from G© substituting all occurrences of Di, U 2 , Da, ^5 and 
□ 7 by the atom A and all occurrences of ^4 and Dg by B. The resulting set 
R has the following production rules 



S - 


Aa;4X6a;7. 


^1467 _ 


Xi 


^467 _ 


X 4 {Xxi. A1467)( t 1467 ) 


^1467 _ 


a; 4 (Axi. A1467)( ^1467) 


^467 _ 


xe( 71467) 


^1467 _ 


- xe{ 711467) 


^467 _ 


X 7 


^1467 _ 


X 7 



which can be simplified to 

S Xxaxqxt .X 4^{Xxi.x\)xr \ Xxj^xex-j .XA{Xxi.x-j)xr \ Xx^XQXT.xexr 

Thus, 

Nhabs(ai) = { Aa;4X6a;7.a;4(Axi.xi)a;7, Aa;4Xga;7.a:4(Aa;i.a;i), 

XxiXe,XT.Xi{Xxi.xr)x’j, Xx^x^x^.x^x-j, Xx4XQ.xe }. 
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2. Gc^/e has production rules 



s - 


Xx4Xexr. C^67 


^1467 


^ Xi 


(j467 . 


X 4 {Xxi. A1467)( ^467) 


(^1467 


X4{Xxi. A4467)( ^1467) 


(j467 . 


Xe{ C467) 


(J14G7 


^ Xe{ C1467) 


^467. 
which can 


xr 

be simplified to 


^1467 


^ Xr 


S 


Xx4XQXg. ^ X4 


{Xxi.xi)xr 1 xe ( C^®^) 


Thus, 


Nhabs(a2) = { Xx4XeX7.X4{Xxi.xi)xr, Xx4Xe.X4{Xxi.xi), 

Xx4XQXr .xe(x4(Xxi .xi)xr) , Xx4Xexr.xe(xQ(x4(Xxi.xi)xr)) 

^as/e has production rules 


S 


Xx4XQXg. A^®^ 


^1467 


^ Xi 


^467 


X4(Aa:i. A1467)( ^467) 


^1467 


^ X4{XX4. A4467)( ^1467) 


^467 


xe( 71467) 


^1467 


- Xe{ 711467) 


^467, 


Xr 


^1467 


^ Xr 



This grammar generates the empty language, thus Nhabs(a 3 ) = 0. 
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Abstract. We propose a new operation of belief revision, called per- 
missive belief revision. The underlying idea of permissive belief revision 
is to replace the beliefs that are abandoned by traditional theories with 
weaker ones, entailed by them, that still keep the resulting belief set 
consistent. This framework allows us to keep more beliefs than what is 
usual using existent belief base-based revision theories. 



1 Introduction 

In this paper we define a new kind of belief revision. We call it permissive belief 
revision, and its main advantage over traditional belief revision operations is 
that more beliefs are kept after revising a set of beliefs. To achieve this result, 
permissive revision takes the beliefs abandoned by some traditional belief revi- 
sion operation, and weakens them, adding their weakened versions to the result 
of the traditional operation. In this way, ’’some parts” of the abandoned beliefs 
are still kept. 

Throughout the article we use the following notation: lower case greek letters 
(a, (3, ...) represent meta-variables that range over single formulas; lower case 
roman letters (a, b, ...) represent single atomic formulas; upper case roman letters 
{A, B, ...) represent sets of formulas; £ represents the language of classical logic 
(either propositional or first-order logic). 

In Section 2 we briefly describe the work in belief revision that is relevant 
for the understanding of this article. In Section 3 we give some motivations, 
and an example, that will provide a better understanding of what is gained 
with permissive revision. After this, in Sections 4 and 5, we formally define this 
operation, and present some examples. In Section 6 we prove some properties 
about permissive revision, and show that it satisifies suitable counterparts for 
the AGM postulates. Finally, in Section 7 we discuss some relevant issues about 
our theory, in Section 8 we make a comparison with other approaches and in 
Section 9 we point out some directions in which the present work may evolve. 

2 Belief Revision 

One of the main sources of inspiration in belief revision, the AGM theory, follows 
the work of [1]. This theory deals with deductively closed sets of sentences, called 
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sets of beliefs. According to the AGM theory, there are three operations on sets 
of beliefs: expansions, contractions, and revisions. 

The AGM theory presents a drawback from a computational point of view, 
since it deals with infinite sets of beliefs. Both [8,9] and [6] modified AGM by 
working with a finite set of propositions, called a belief base, B, and using the 
set of consequences of B, defined as Cn{B) = {(p : B h tp}} 

We also define permissive revision on finite sets of beliefs. The traditional 
revision of a consistent belief base B with a formula <p, represented by {B * <p), 
consists in changing B in such a way that it contains (p and is consistent (if 
(p is consistent). The case of interest is when B U {<p} is inconsistent, because, 
otherwise, (p can just be added to B. 

To perform the revision (B * (p) when B U {cp} is inconsistent, we have to 
remove some belief(s) from B, before we can add <p. In other words, in a revision 
{B * (p) some belief(s) must be discarded from B. 

3 Motivations 

The idea of permissive revision is to transform the beliefs that were discarded in 
a traditional revision into weaker versions and to add them to the result of the 
revision. Permissive revision, thus, corresponds to a “smaller” change in beliefs 
than traditional revision, while keeping the goal of having a consistent result. 

Gonjunctions are the most obvious candidates to be weakened. This aspect 
was already recognized by [7], who discussed that revision theories sometimes 
require to give up too many beliefs, without providing a solution to the problem. 
While Lehmann only presents the problem regarding conjunctions, we argue that 
this problem is more general and that it can arise with other kinds of formulas. 

To illustrate the main idea behind the weakening of conjunctions, suppose, 
for instance, that some traditional revision operation provides the result: 

({a A 6, a c} * —'c) = {a c, ~^c} 

Permissive revision, represented by @, weakens the abandoned formula, aAb 
to b, and adds this to the result of traditional revision: 

({a A 6, a c} @ ->c) = {&, a c, ->c} 

4 Formalization 

By now, it should be clear that the main task in defining permissive revision is 
the definition of a function Wk, which weakens the formula that was removed 
during traditional revision. Actually, since there may be more than one such 
formula, we consider the conjunction of all the removed formulas, and weaken it 
into a new formula which will then be added to the result of traditional revision 
to obtain permissive revision. 

^ h represents the classical derivability operation. 
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The function Wk will have different definitions, depending on whether we are 
using classical logic, a non-monotonic logic or some other logic. In this article, 
we restrict ourselves to classical first order logic. 

Weakening a formula depends, naturally, on the set of formulas into which 
we will be adding the result. Therefore, the function Wk will depend on the 
formula to weaken and a set of formulas: 

Wk : £ X 2^ ^ £ 

Wk(0, W) can be interpreted as “Weaken the formula (j), in such a way that after 
the weakened formula is added to W, the resulting set is not inconsistent” . 

Given such a function, we can formally define the permissive revision of a set 
of formulas W with a formula 4>, {W @ </>). Let Abandoned be the conjunction 
of all the formulas which were abandoned during the traditional revision of W 
with (j), Abandoned = /\{W — {W * 4>)). Then, the permissive revision of W with 
(j) is given by 



{W®(j)) = {W *<j))U {Wk{Abandoned, {W * (/>))} 



Let us now see how a formula is weakened. Obviously, this depends on 
the type of formula in question. The example in the previous section con- 
veys the main ideas behind weakening conjunctions. However, there are other 
logical symbols besides conjunctions. Considering the usual logical symbols, 
{->,=^,A,V,3,V}, we have the following definition for Wk.^ 



Wk((/), W) = 



4 > 

WkN((/>, W) 
WkI((/>, W) 
WkD((/),W) 
WkC((/), W) 
WkE(^, W) 
WkU(</>, W) 
T 



iiW U {(j)} is consistent 
if (/) is a negation 
if (p is an implication 
if (/) is a disjunction 
if (/) is a conjunction 
if 4> is an existential rule 
if (/) is a universal rule 
otherwise 



Note that, although Wk will only be used, in the context of permissive revision, 
to weaken a formula p known to be inconsistent with W , the weakening process 
is recursive (on the structure of formulas), and there may be sub- formulas which 
are consistent with W . That’s the reason for the first case. As for the last case, 
which means that (p is an atomic formula inconsistent with W, there is no weaker 
formula we can give than a valid formula. 

^ This definition of Wk has some steps similar to the conversion to Conjnnctive- 
Normal-Form, and it could be simpler if the knowledge base were required to be in a 
canonical form (CNF for instance). However, the syntactic differences between two 
logically equivalent formulas are important from the knowledge representation point 
of view. 
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Next, we define each of the weakening functions mentioned above. We should 
keep in mind that a good weakening function should allow us to keep as much 
information as possible. In order to do that for non-atomic formulas, we weaken 
each sub-formula and combine the results. 

When (j) = ~^a, for some atomic formula a, there is nothing we can retain of 
the weakening of (j). However, if a is a non-atomic formula, a V 5, for instance, we 
can apply logical transformations to (j) to bring to the surface a kind of formula 
we know how to handle. In this case ~^{aWb) is logically equivalent to (^a) A(^6). 



WkN((/>, W) 



Wk(^a A 


-A w^) 


if </> = 


~^(a V /3) 


Wk(^a V 


-A w^) 


if (/) = 


^(of A j3) 


Wk(a A -1 


AW) 


if (/) = 


~^(a ^ /3) 


Wk(a,IF) 


if (/) = 




Wk(V(x)^a(a;), IF) 


if (/) = 


^3(o:)q;(x 


Wk(3(a;)^a(a;), IF) 


if (/) = 


^V(a;)a(x 


T 




otherwise 



Weakening an implication is treated in a similar way, transforming the impli- 
cation into the logically equivalent disjunction, and weakening the result instead. 

Wkl(a ^(3,W)= Wk(^a V /3, W) 

If (/) = a V /3, and it is inconsistent with W (otherwise WkD would not be 
used), then both a and [3 are inconsistent with W. So, to weaken <j) we have 
to individually weaken both a and (3, in W , and combine the results with the 
disjunction again. 

WkD(a V /3, W) = Wk(a, W) V Wk(/3, W) 

Conjunction seems to be a more complex case. To help understand its def- 
inition we present some examples. First, consider the set W = {a A b} and its 
revision with ^a. Using permissive revision, we use Wk(a A b, {^a}) and expect 
it to give b. We just have to abandon one of the elements of the conjunction 
and keep the other. However, if each element is itself a non-atomic formula, the 
contradiction may be deeper inside in either one or in both of the elements of 
the conjunction. For instance, given W = {(a A 6) A (c A d)} and revising it with 
^{b A c) we would like to get (a A (c A d)) V ((a A 6) A d), i.e., if it’s not possible 
to have both b and c, then we would like to have either a, b and d or a,c and d. 
This is the result of WkC((a A 6) A (cA d), {^(6 A c)}), according to the following 
definition. 

WkC(a Af3,W) = (Wk(a, W) A Wk(/3, W U {Wk(a, IF)})) V 
(Wk(/3, W) A Wk(a, W U {Wk(/3, IF)})) 

Handling existentially quantified formulas will be done through skolemiza- 
tion, weakening the formula which results from the elimination of the existential 
quantifier. 

WkE(3(a;)Q;(a;), IF) = Wk(a(p), IF), where p is a Skolem constant 




Permissive Belief Revision 



339 



Finally, the result of weakening universally quantified formulas is just T. 
This means that, in what concerns this kind of formula, permissive revision 
brings nothing new. In Section 9, we discuss some alternatives to the weakening 
of universally quantified formulas. 

WkU(V(a;)a(x),VF) = T 



5 Examples 

In this section we present some examples, to illustrate permissive revision. In all 
the examples we present, permissive revision keeps more beliefs than traditional 
revision. Of course, this is not always the case. Sometimes both revisions give 
the same result. 

Example 1 (Weakening of conjunctions). In the first situation both conjuncts 
are inconsistent with the result of traditional revision; in the second situation 
only one of the conjuncts is inconsistent; and in the third situation none of 
the conjuncts by itself is inconsistent, only the conjunction of them causes the 
inconsistency. 

1. Both conjuncts are inconsistent 



W = {a A (6 A c), a d, 6 d} 



suppose 



{yV * ~^d) = {a => d, 6 d, ^d} 



then 

Wk(a A (& A c), {W * ^d)) = 

= (Wk(a, {W * ^d)) A Wk(6 A c, (IF * ^d) U {Wk(a, {W * -d))})) V 
(Wk(6 A c, (IF * ^d)) A Wk(a, (IF * ~^d) U {Wk(6 A c, (IF * ^d))})) 
= (T A Wk(& A c, (IF * ^d))) V (Wk(6 A c, (IF * ^d)) A T) 

= Wk(d A c, (IF * ~^d)) 

= (Wk(6, (IF * ^d)) A Wk(c, (IF * ^d) U {Wk(6, (IF * -d))})) V 
(Wk(c, (IF * ^d)) A Wk(6, (IF * ^d) U {Wk(c, (IF * -d))})) 

= (T A c) V (cA T) 

= c 



and 



(IF ® ^d) = {a d, & d, ~^d, c} 

Note that in the traditional revision we can no longer derive c, but this is 
still a consequence of the permissive revision. 
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2. Only one of the conjuncts is inconsistent 

W = {a A b, a ^ c} 

suppose 

(W * -ic) = {a c, ->c} 

then 

Wk(a A b, {W * ^c)) = 

= (Wk(a, {W * ^c)) A Wk(6, {W * ^c) U {Wk(a, {W * -c))})) V 
(Wk(&, {W * ^c)) A Wk(a, {W * ^c) U {Wk(6, {W * -c))})) 

= (T A Wk(6, {W * ^c) U {T})) V (6 A Wk(a, (W" * ^c) U {6})) 
= (TA6)V(&AT) 

= b 

and 

(kk @ ->c) = {a^ c, -ic, 6} 

Like before, we keep more beliefs than traditional revision, namely b. 

3. None of the conjuncts by itself is inconsistent 



W = {a A b, {a A b) ^ c} 



suppose 



{W * -ic) = {(a A 6) c, -> 0 } 



then 



Wk(a A b, {W * -ic)) = 

= (Wk(a, {W * ^c)) A Wk(6, {W * ^c) U {Wk(a, {W * -c))})) V 
(Wk(6, {W * ^c)) A Wk(a, {W * ^c) U {Wk(6, {W * -c))})) 

= (a A Wk(6, {W * ^c) U {a})) V (6 A Wk(a, (kk * ~^c) U {b})) 

= (a AT) V (6 AT) 

= a\J b 



and 



(kk @ ->c) = {(a A &) c, ->c, a V 6} 
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Example 2 (Weakening of disjunctions). We now present one example of weak- 
ening a disjunction. Obviously, both disjuncts are inconsistent with the result 
of traditional revision, otherwise the disjunction would not be inconsistent. Fur- 
thermore, the disjuncts are both non-atomic, otherwise the result would be T. 



W = {{a A b) V {c A d) , b ^ e, d ^ e, a ^ f , c ^ f} 



suppose 

(W * ->e) = {b^e,d^e,a^f,c^f, ->e} 

then 

Wk((a A &) V (c A d), (W * ^e)) = 

= WkD((a A 6) V (c A d), {W * ->e)) 

= Wk(a A b, {W * -■e)) V Wk(c A d, {W * ^e)) 

= WkC(a Ab,{W * ^e)) V WkC(c A d, (W * ^e)) 

= a\J c 

and 

(W @ “>e) = {a V c, 6 e, d e, a f,c^ /, “>e} 

Note that in the traditional revision we can no longer derive, for instance /, 
but this is still a consequence of the permissive revision. 

Example 3 (Weakening an existentially quantified formula) . 

W = {3(a:)a(a:) A b(x),V(x)a(x) c(a;)} 

suppose 

{W * V(x)^c(a:)) = {V(x)a(x) c(x), V(x)^c(a;)} 

then 

Wk(3(cc)a(a;) A b{x), {W * V(a;)^c(a;))) = 

= WkE(3(a;)a(a;) A b{x), {W * V(x)-ic(a:))) 

= Wk(a(p) A b(p), {W * V(x)-'c(a;))) 

= WkC(a(p) A b{p), {W * V(x)->c(a:))) 

= h{p) 

where p is a Skolem constant and 

{W @ V(a;)-'c(a:)) = {V(a;)a(x) c{x)y{x)^c{x),b{p)} 

In words, permissive revision, unlike traditional revision, allows us to keep 
the belief 3{x)b{x). 
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6 Properties 

We now prove two essential properties of the Wk function. By essential proper- 
ties, we mean that it would be unacceptable for the Wk function not to satisfy 
them. The first property ensures that we don’t produce an inconsistent set when 
we add the result of weakening a formula to the result of the traditional revi- 
sion. The second property ensures that we are not able to derive new conclusions 
from the result of weakening a formula, that were not derivable from the formula 
itself. 

The next theorem guarantees the first of these properties. 

Theorem 1 Let W he a consistent set of formulas, and <f any formula. Then 
W U {Wk(^, W)} is consistent. 

Proof. If 4> is consistent with W, then Wk((/), W) = 4> and the result follows 
trivially. Otherwise, we will prove by induction on the structure of the formula 
(j) that the weakening function produces a formula consistent with W . 

If (/) is a literal (an atomic formula or the negation of an atomic formula) or 
a universally quantified formula, then Wk(^, W) = T, and therefore IT U {T} is 
consistent, provided that W is consistent. 

The cases where <j) is of the form or a=J>/3, reduce to one of the other cases, 
since the weakening of <j) in these cases reduces to the weakening of a logically 
equivalent formula, with either a quantifier, a disjunction or a conjunction. 

Assume that a, f3 and 7 (p), where p is some constant, are formulas that 
verify the theorem. Since W U {Wk( 7 (p), IT)} is consistent by hypothesis, then 
IT U {Wk(3(a;)7(a;), IT)} is also consistent, by definition of WkE. Accordingly, 
given that IT U {Wk(o;, IT)} is consistent, and, therefore, IT U {Wk(a, IT) V 
Wk(/3, IT)} is consistent, we prove that IT U {Wk(a V fd, IT)} is also consistent. 
Finally, let IT' = IT U {Wk(a, IT)}, which, as we have seen, is consistent. Since, 
by hypothesis, IT'U{Wk(/3, IT')} is consistent, i.e., ITU{Wk(Q;, IT), Wk(/3, IT')} 
is consistent, we have that IT U {Wk(a, IT) A Wk(/3, IT')} is consistent, from 
where it follows trivially that IT U {Wk(a A /?, IT)} is consistent, which finishes 
our proof. □ 

Theorem 2 guarantees that the result of weakening a formula is not stronger 
than the original formula, i.e., we do not introduce new beliefs. 

Theorem 2 Let IT &e a set of formulas, and (j> any formula. Then Wk(^, IT). 

Proof. If 0 h T then 4> \- ip for every formula and in particular for ip = 
Wk(0, IT). If (p is consistent with IT, then Wk((/>, IT) = <j) and, obviously, <p h 
(j) = Wk(0, IT). Otherwise, as above, we will prove by induction on the structure 
of the formula p> that the weakening function produces a formula not stronger 
than the original. 

The structure of this proof is similar to the previous one: if (/> is a literal or 
a universally quantified formula, then Wk(((), IT) = T, and (/> h T; if is of the 
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form or a /3, the weakening of 4> reduces to the weakening of a logical 
equivalent formula, with either a quantifier, a disjunction or a conjunction. 

By eliminating the existential quantifier, we have that 3(x)^{x) h 7 (p) for 
some Skolem constant p. By hypothesis, ^{p) h Wk( 7 (p), W) = Wk(3(x)7(a;), kP), 
and, therefore, 3(cc)7(a;) h Wk(3(cc)7(a;), kP). 

Assume that a and (3 are formulas that verify the theorem. Given that, by 
hypothesis, a h Wk(a, kP), then a h Wk(a,kP) V Wk(/3, kP), and, likewise, 
since [3 h Wk(/3, kP) then (3 h Wk(a, kP) V Wk(/3, kP). Joining the two, we have 
that Of V /? h Wk(a, kP) V Wk(/3, kP), i.e., a V /3 h Wk(o; V (3, kP). To finish the 
proof, let’s see that conjunction preserves the theorem: from a h Wk(a, kP) and 
(3 h Wk(/3, kP U {Wk(a, kP)}), it follows trivially that a A/3 h Wk(a A/3, kP). □ 



Although we don’t consider it essential, we now prove another theorem that 
will be needed when we prove the satisfaction of the AGM postulates for our 
theory. The theorem says that the results of weakening a formula, with respect 
to two logically equivalent sets, are the same. 



Theorem 3 Let kP and kP' be two sets of formulas, such that Gn(kP) = C'n(kP'), 
and (j> any formula. Then Wk((/), kP) = Wk((/), kP'). 



Proof. The only use of kP in the definition of Wk is to check whether kP U {^} 
is consistent. Since C'n(kP) = Cn(kP'), kP U {</>} is consistent, iff kP' U {4>} is 
consistent. □ 



We now show that permissive revision satisfies the AGM postulates for revi- 
sion, if the traditional revision satisfies these postulates. Since these postulates 
refer to a revision operation on belief sets (theories or closed sets), and permis- 
sive revision was defined on bases (finite sets), the first thing to do is to define 
a corresponding permissive revision on theories. 

Let kP be a base, T the theory generated by kP, i.e., T = Cn(kP), and </> a 
formula. The permissive revision of the theory T with the formula 4>, (T ®t 4>), 
is defined by 

(T ®T (jj) = Cn(W @ (/>) 

Note that this definition implies that (T ®t 4>) depends not only on T and 4>, 
but also on the base kP from which T was generated. 

We recall that permissive revision is defined in terms of a traditional revision, 
*, by: 

(kP ® (f) = (kP * ^) U {Wk{ Abandoned, (kP * (/)))} 

Before we prove anything on permissive revision, let us assume that the tradi- 
tional revision on bases, *, satisfies suitable counterparts to the AGM postulates. 
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(*1) {W * (/>) is a base 
(* 2 ) 

(*3) {W *(f)CW\J{<f)} 

(*4) If ^ Cn{W), th.ex\ W U {4>} C {W * (f) 

(*5) {W * <f)) is inconsistent, iff € Cn{W) 

(*6) If ^ ■(/> G Cn(0), then {W *</<) — {^} = {W * ijj) — {ip} 

Of these postulates, only postulate (*6) is not a straightforward counterpart 
to the corresponding AGM postulate. The straightforward counterpart would be 

If (/) G Cn(0), then {W *</>) = {W * f/') 

Since we are dealing with bases, and not closed sets, it is not reasonable to expect 
such a result (unless, of course, (p and are not only equivalent, but also the 
same formula). What is reasonable to assume is that the revisions of the same 
base with two equivalent (but different) formulas, only differ between them in 
these formulas. 

If we now define a traditional revision on theories as 

(T * 7 ’ (p) = Cn(W * p) 

where T is the theory generated by the base IT, T = Cn(W), it is trivial to 
show that the AGM postulates are satisfied by *t- 

(^T’l) (T *T P) is a theory, i.e., (T P) = Cn{T P) 

(*t’2) P G {T *j' p^ 

(*t3) {T*Tp)<^Cn{T\j{p}) 

(*t4) If ^p ^ T, then Cn{T U {(()}) C (T P) 

(*t5) (T P) is inconsistent iff -~p G Gn(0) 

(*t6) li P P & (777.(0), then (T p) = (T P) 

Now that we have established the postulates satisfied by the traditional re- 
vision on which permissive revision is based, we prove the following theorem. 



Theorem 4 Let ®t be a permissive revision on theories defined as before: 

{T®t P) = Cn{W®p). 

Then, for any theory T (generated from a base W ), and any formulas p and p, 
®T satisfies the AGM postulates. 

(@t1) {T ®t P) is a theory, i.e., (T ®t p) = Cn{T P) 

(@t2) P £ {T ®t P) 

(@t3) {T®Tp)QCn{T\J{P}) 

(@t4) If ~^p p. T, then Cn{T U {^}) C (T ®t p) 

(@t5) (T ®t P) is inconsistent, iff ^p G Cn(fil) 

(@t6) If P<^ P £ Cnfib), then (T P) = {T ®t P) 
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Proof. 

(®t1) {T ®t (f) is a theory, i.e., (T ®t (p) = Cn{T *t </>)• 

By definition of ®t- 

(@t2) (f> £ {T ®t (j>). 

By definition of ®t and (*2). 

(@t3) (T ®t (p) C Cn{T U {<}()}). 

By definition of ®t and ® 

(T ®T (p) = Cn(W ® (p) = Cn{{W * (p) U {Wk{ Abandoned, fW * </>))}) 

By (*3) {{W * (p) C W LI {^}), Theorem 2 {Abandoned h Wk{Abandoned,W)), 
and monotonicity 

Cn{{W * (p) L {Wk{Abandoned, {W * </<))}) C Cn{W U {(p} U {Abandoned}) 

By definition of Abandoned, Abandoned = f\{W — {W * </>)), we have that W h 
Abandoned, so 



Cn{W U {(p} U {Abandoned}) = Cn{W U {</>}) 

Finally, since 

Cn{W U = Cn{Cn{W) U {cp}) = Cn{T U {</>}) 

our proof of (@t3) is complete. 

(@t4) If ~^(p T, then Cn{T U {^}) C (T @r (p)- 

Cn{T U {^}) = Cn{Cn{W) U {(?i}) = Cn{W U {</>}) 

By (*4), since -^<p ^ T, we have that W U {(p} C {W * (p). This, together with 
monotonicity, implies that 

Cn{W U {(/)}) C Cn{W * (p) 

Now, if ^(p ^ T, then, by definition. Abandoned = A{} = T, and Wk{Abandoned, 
{W * (p)) = T. So, {W®(p) = {W* (p), and (T ®t (p) = Cn{W * (p). 

(@t5) {T ®t (p) is inconsistent, iff -‘(p £ Cn(0). 

By definition, 

(T ®T <p) = Cn{W ® (p) = Cn{{W * (p)L {Wk{ Abandoned, {W * ^))})- 

If ^(p £ Cn(0), then, by (*5), {W * (p) is inconsistent, and so is {T ®t (p). 

If --(p ^ Cn{0), then, by (*5), {W * (p) is consistent, and, by Theorem 1 so is 
{W * (/>) U {Wk{Abandoned, {W * </<))}. 
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(@t6) li (j) 4’ & Cn{%), then (T ®t 4>) = {T ®t '*/’)• 

By definition, 

(r ®T </*) = Cn{W ® (f) = Cn{{W * 4>)U {Wk{ Abandoned^, (W * 
where Abandonedff, = /\{W — {W * ^)), and 

(T @ 7 ’ ijj) = Cn{W ® -tp) = Cn{{W * ■>p)U {Wk{ Abandoned^ , {W * ■*/'))})> 
where Abandoned^ = f\{W — {W 

Since, by (*6), {W * (p) — {p} = {W * p) — {ip}, we have that 

Abandonedc/) = Abandoned^ 

Since, by (*t6), Cn{W * p) = Cn{W *p), by Theorem 3, we have that 
Wk{ Abandoned (j,, {W * p)) = Wk{ Abandoned^ , {W * p)) 
which ends our proof. □ 



7 Discussion 

Traditional belief revision theories [9,6] may produce different results when re- 
vising logically equivalent theories with the same formula, i.e., they are syntax- 
dependent. For example, the fact that both a and b are true may be repre- 
sented either by {a A b} or by {a, 6}. These two representations will provide 
different results when revised with ^a. This should be expected, since we are 
dealing with syntax-based approaches and finite belief sets. However, in this 
example, if permissive revision is applied to weaken the formulas removed by 
traditional revision, the result will be the same, {6}. This allows us to conclude 
that, in some cases, the syntax-dependency of traditional approaches is nullified 
by permissive revision. Furthermore, the weakening function by itself is syntax- 
dependent, as the following example shows: Wk(a V 6 c, {a, ->c}) = T, but 
Wk((a c) A (6 c), {a, -ic}) = b ^ c. Again, such a behaviour should be ex- 

pected: the weakening function completely relies on the syntax of the formula 
to be weakened. If this dependency was considered a flaw, the use of a canonical 
form for the formula to be weakened would very easily eliminate it. However, 
since the weakening function is applied to the result of a traditional theory, 
which is syntax-dependent, it wouldn’t make much sense. Theorem 3, on the 
other hand, shows that this function is not dependent on the syntax of the set 
in respect to which a formula is weakened. 

A preliminary report of the work presented in this article appears in [3] . The 
present article contains, in addition to the preliminary report. Theorem 3 and 
the proof of the AGM postulates. 
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8 Comparison with Other Approaches 

To the best of our knowledge, when we submitted this article no similar work 
had been done, except for our preliminary report [3]. The work presented in [2] 
however, also aims at minimizing the loss of information by weakening infor- 
mation involved in conflicts rather than completely removing it. We will first 
convey the main ideas behind their work, and then present some comments on 
the comparison of both approaches. 

In [2] it is assumed that the available information is given as an ordered 
knowledge base {KB), i.e., a ranking of information as logical sentences: KB = 
{Si,S 2 , . ■ . , Sn}- When revising a KB with a formula <j), they start with i = 1 
and KB = {4>}; for each Si, if it is consistent with KB then KB ^ KB U 
Si- Otherwise, all possible disjunctions (of the formulas in conflict) of size 2 
are computed. If they are consistent with KB then they are added to KB. 
Otherwise, all possible disjunctions of size 3 are computed, and so on. 

One major difference between both approaches is that [2] is a “complete” 
revision operation, while ours can be applied to the result of any traditional 
revision operation.^ So, our theory is also more permissive in the sense that it 
allows any traditional theory to choose the formulas to weaken. 

The only example in [2] has its knowledge base in clausal form, although this 
does not seem to be a requirement. If we convert the examples in this paper 
to clausal form, both approaches produce exactly the same results in all the 
examples. Further work is needed to prove whether this is always so. However, 
if we use our original examples, the results are not same. Actually, since in all 
our examples only one formula is removed, the work of [2] simply discards that 
formula, while ours weakens it. 

9 Future Work 

As we saw in Section 4, universal rules are weakened to T, which is obviously 
too drastic a solution. This aspect can be improved in two directions. When 
considering a monotonic logic, a universal rule can be weakened following the 
general ideas presented in Section 4. For instance, if we have \/{x)a{x) A b{x), 
and revise this with ^a{p), the universal rule must be abandoned, but it can be 
weakened to \/{x)b{x). 

In another direction, i.e., when considering a non-monotonic logic, the most 
natural way of weakening a universal rule is to turn it into the “corresponding” 
default rule. Of course, defining the exact meaning of “corresponding” default 
rule will depend on the particular non-monotonic logic being considered, but we 
can state this informally as turning a universal like “All As are Bs” into the 
default “Typically, As are Bs”. See [10] for an approach to this problem, using 
Default Logic. 

We intend to implement permissive revision on top of SNePSwD [4]. The fact 
that this system is a belief revision system, with an underlying non-monotonic 

® Our approach could even be applied to the result of theirs. 
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logic [5] will be particularly helpful. This system already has mechanisms for 
determining the consistency of belief sets, and keeping a record of inconsistent 
sets, which will be necessary for the implementation of the weakening function. 
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Abstract. This paper addresses constraint solving over continuous domains in 
the context of decision making, and discusses the trade-off between precision in 
the definition of the solution space and the computational effort required. In 
alternative to local consistency, which is usually maintained in handling 
continuous constraints, we discuss maintaining global hull-consistency. 
Experimental results show that this may be an appropriate choice, achieving 
acceptable precision with relatively low computational cost. The approach 
relies on efficient algorithms and the best results are obtained with the 
integration of a local search procedure within interval constraint propagation. 



1 Introduction 

Model-based decision support relies on an explicit representation of a system in some 
domain of interest. Given the inevitable simplification that a model introduces, rather 
than obtaining the solution in the (simplified) model that optimises a certain goal, a 
dccisor is often more interested in finding a compact representation of the solutions 
that satisfy some constraints. From the extra knowledge that the dccisor might have, 
the set of acceptable solutions may be subsequently focussed into some region of 
interest defined by means of additional constraints. 

An interesting implementation of these ideas, is presented in [8] which addresses 
this type of problems in the domain of engineering. In this domain, models are usually 
specified as a set of constraints on continuous variables (ranging over the reals). 
Moreover, these constraints are usually non linear, which makes their processing quite 
difficult, since any small errors may easily he expanded throughout computation. The 
approach is based on the exploitation of octrees in order to determine the regions of 
the 3D space that satisfy the intended constraints. Although the solution proposed is 
rather ingenuous, its application to problems with non-convex solution spaces runs 
into difficulties, namely the large number of convex solutions that might he 
computed. 

Rather than finding precise hounds of many convex 3D spaces, a simpler approach 
relics on defining the upper and lower hounds of every variable that contains the 
solution space. Although less precise than the former, subsequent interaction with the 
user may shorten these hounds to specific regions of interest. This approach relies on 
reasoning about variables that lie on certain intervals, namely the ability to shorten 
their interval domain by pruning the (outward) regions where it is possible to prove 
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that no solutions may exist. This is the aim of interval constraints, where variables 
range over intervals and arc subject to constraint propagation to reduce their domains. 

As with other domains, constraint propagation of intervals maintains some form of 
local consistency of the constraint set. Typically, two types of local consistency arc 
considered: box-consistency [1, 15] and hull-consistency [9, 2]. Neither of them is 
complete and the pruning of the domains that is obtained is often quite poor. To 
improve this pruning maintenance of higher order local consistencies (3B-consistency 
[9], Bound-consistency [13]) might he envisaged, but their implementation is 
complex and published results arc very scarsc. We have recently proposed global 
hull-consistency [4, 6], but its implementation suffered from similar efficiency 
difficulties. 

In this paper, we present improved procedures that maintain global 
hull-consistency, and show that the best results are obtained with the integration of 
constraint propagation with a form of local search commonly adopted in 
multidimensional root finding over the reals. 

The paper is organised as follows. To make the paper self contained, sections 2 and 
3 overview the main concepts involved in, respectively, constraint solving in 
continuous domains and interval constraints. Section 4 presents the definition of 
global hull-consistency and describes several procedures to enforce it. One such 
procedure requires the search for a solution, and a local search implementation of 
such procedure is presented in section 5. Section 6 compares the results obtained on a 
particular example, by a) applying the octree approach; b) maintaining local 
consistency and c) maintaining global-hull consistency. For the latter case, the section 
discusses the efficiency of the different procedures. Finally, the main conclusions are 
summarised and future research directions discussed. 



2 Continuous Constraint Solving Problems 

Many problems of the real world can he modelled as constraint satisfaction problems 
(CSPs). A eSP is defined by a set of variables each with an associated domain of 
possible values and a set of constraints on subsets of the variables. A constraint 
specifies which values from the domains of its variables are compatible. A solution is 
an assignment of values to all variables, which satisfies all the constraints. 

The notion of CSP was initially introduced [10] to address combinatorial problems 
over finite domains. Continuous CSPs (CCSPs) [8] arc extensions of the earlier CSP 
framework to address variables with continuous domains. In CCSPs all variable 
domains arc continuous real intervals and the constraints arc specified as numeric 
relations, either equalities or inequalities. 

The CeSP framework is powerful enough to model a wide range of real world 
problems, in particular problems involving uncertain continuous parameters. A CCSP 
can have one, several or no solutions. In many practical applications the modelling of 
a problem as a CCSP is embedded in a larger decision process. Depending on this 
decision process it may he desirable to determine whether a solution exists (verify the 
consistency of the CCSP), to find one solution, to compute the space of all solutions 
of the CCSP, or to find an optimal solution relative to a given cost function. 
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In this work wc address under-constrained CCSPs, that is, CCSPs with a large 
(usually infinite) solution set, where the goal is not to find a particular solution hut 
rather to capture the set of all solutions. This is often the ease of continuous decision 
problems where the constraints arc not enough to identify a single solution hut may 
circumscribe the set of possibilities to some specific regions of the search space. 



2.1 The Representation of Continuous Domains 

In CCSPs, the initial domains associated with the variables are infinite connected sets 
of real numbers called real intervals. A real interval, which can be closed, open or half 
open, will he generally denoted by <a..b> where a and b represent the bounds. 

In practice, computer systems are restricted to represent a finite subset of the real 
numbers, the floating-point numbers. Several authors [9, 1, 15] have defined the set F 
of machine numbers (T’-numhers) as the set of floating-point numbers augmented with 

the two infinity symbols and -too), f is totally ordered and if/ is an T’- number/" 
and /*■ denote respectively the two T’-numhers immediately below and immediately 
above /in the total order. An T’-interval is a closed real interval [a..b] where a and b 

are T’-numbers. In particular, if b=a or b=a'^ the T’-interval is called canonical. 

The finite subset of the real intervals that can he represented by a particular 
machine is the set of all T’-intervals. However, any real interval may be associated 
with a larger T’-interval (wrt set inclusion). The T’-interval approximation of a real 
interval <a..b> is the smallest T’-interval [LaJ..rZi1] containing all its elements (where 
LaJ denotes the largest T’-numher not greater than a and \b \ denotes the smallest 
T’-numher not smaller than b). 

Real box, T’-hox and T’-box approximation arc extensions to several dimensions of 
the concepts of a real interval, T’-interval and T’-interval approximation. Real boxes 
and T’-hoxes will he denoted by tuples of real intervals and T’-intcrvals, 

respectively. A Real box is the Cartesian product of real intervals. An T’-box is the 
Cartesian product of T’-intcrvals, in particular, if all the T’-intcrvals arc canonical the 
T’-box is canonical. The T’-box approximation of a real box is the smallest T’-hox 
enclosing the real box. 

We will call a canonical solution of a CCSP to any canonical T’-hox that cannot be 
proved inconsistent (wrt to the CCSP) either because it contains solutions or due to 
approximation errors in the evaluation of the constraint set. 



2.2 Solving Continuous Constraint Satisfaction Problems 

Solving a CCSP can be seen as a search process over the representable elements of 
the lattice of the variable domains. For a given CCSP a complete domain lattice L is 
defined by the elements obtained from the power set of the initial domains box, 
partially ordered by set inclusion (c) and closed under arbitrary intersections (n) and 
unions (u). A search procedure may start at the top of the domain lattice and navigate 
over the accessible elements of the lattice until eventually stopping, returning one of 
them. If it returns the bottom element (the empty set {}) then the CCSP has no 




352 



Jorge Cruz and Pedro Barahona 



solution. The navigation over the lattice elements usually alternates pruning with 
branching steps and ends whenever a stopping criterion is satisfied. To simplify the 
domains representation, most solving strategics, impose that the only elements 
considered in the pruning and branching steps, arc representable by a single T’-box. 

Pruning consists of replacing an element of the lathee for a smaller element (wrt 
set inclusion) as a result of applying an appropriate filtering algorithm, which 
eliminates some value combinations inconsistent with the constraints. The filtering 
algorithm attempts to narrow the original T’-Box into a smaller one (or prove its 
inconsistency) guaranteeing that no possible real solution is lost. If the filtering 
algorithm were complete, all inconsistent combinations of values would be deleted 
and the new top element would only contain the soluhon space. However, this is 
generally not the case, and several inconsistent combinahons may shll remain. 

The branching step may be applied when the pruning step fails to further eliminate 
inconsistent combinations of values. The idea is to split a set of value combinahons 
into smaller sets for applying latter the pruning step, hoping to get better filtering 
results on these reduced sets. The branching step usually consists on splithng the 
original T’-box into two smaller T’-boxes by splitting one of the variable domains. 

Search over different branches may be done concurrently or by some backtracking 
mechanism unhl a stopping criterion is attained. This stopping criterion may be the 
achievement of the intended goal or the sahsfachon of some specific properhes 
imposed to avoid the complexity explosion of the search procedure. If the goal is the 
set of all soluhons then the final result must be the union of the elements remaining at 
the end of the search process. 



3 Interval Constraints 

The Interval Constraints framework, firstly introduced by Cleary [3], combines 
Artificial Intelligence techniques with Interval Analysis methods for solving CCSPs. 

Pruning of variable domains is based on constraint propagation techniques initially 
developed in Artificial Intelligence for finite domains. The main idea is to use partial 
information expressed by a constraint to eliminate some incompatible values from the 
domain of the variables within the scope of the constraint. Once the domain of a 
variable is reduced, this information is propagated to all constraints with that variable 
in their scopes, which must be cheeked again to possibly reduce further the domains 
of the other constrained variables. Propagation terminates when a fixed point is 
attained, that is, the variables domains cannot be further reduced by any constraint. 

To reduce the domains of the variables appearing in a constraint, a filtering 
algorithm must enforce a local property relating their domains accordingly to the 
constraint. This property, imposed at the constraint level to each constraint, is called 
local consistency. 

Interval Analysis, introduced by Moore in [1 1], enable the definition of sound local 
filtering algorithms. Moreover, efficient interval methods developed in Interval 
Analysis, (e.g. the interval Newton method [11, 7]), are used in interval constraints 
for producing efficient filtering algorithms. 
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3.1 Local Consistency 

In CCSPs the enforcement of a local consistency wrt a constraint consists on 
narrowing an /^-box (representing the domains of the variables within the constraint) 
into the largest /^-box within the original one satisfying some local property. 

The two main local consistencies used for solving CCSPs are hull-consistency [2] 
(or 2B -consistency [9]) and box-consistency [1, 15], Both are approximations of 
arc-consistency [iO], widely used in finite domains. Arc-consistency eliminates a 
value from a variable domain if it has no support, i.c. if no compatible values exist in 
the domain of the other variables sharing the same constraint. In continuous domains 
this kind of enumeration is not possible, so hull and box-consistency both assume that 
the domains of the variables are convex (they are represented by T’-intervals), and aim 
at simply tightening their outer bounds. If the correct domains (wrt a constraint) are 
disjunctive both local consistencies admit inconsistent values within the outer limits 
of the variable domains. 

The key idea behind hull-consistency is to guarantee arc-consistency only at the 
bounds of the variable domains. If a variable is instantiated with the value of one of 
its hounds then there must he a consistent instantiation of the other constraint 
variables. In practice, the result of enforcing hull-consistcncy on a box of domains is 
an T’-box approximation of the real box that would be obtained by enforcing the 
above local property. The existing hull-consistcncy algorithms decompose the 
original constraint system into a set of primitive constraints (with the addition of extra 
variables), for which this property can he enforced by interval arithmetic operations. 

The major drawback of this decomposition approach is the worsening of the 
locality problem. The existence of intervals satisfying a local property on each 
constraint does not imply the existence of value combinations satisfying 
simultaneously all of them. When a complex constraint is subdivided into primitive 
constraints this will only worsen this problem due to the addition of new variables and 
the consequent loss of dependency between values of related variables. 

Consider constraint XiX{x 2 -Xi)=Q and T’-hoxes A=<[0..2],[1..2]> and 
B=<[-1..3],[1..2]>. Clearly, any solution of the above constraint must he a point 
within the line Xi=0 or within the line Xi=X 2 - Therefore, box A, which is not 
arc-consistent (since Xi=0.5 has no support in X2^[\.2\), is hull-consistent because 
each bound of each variable has a support in the other variable (c.g. Xi=2 has the 
support X2=2 defining a point within the line Xi=X 2 ). Moreover, A is the largest hull- 
consistent T’-hox within B and so it should be obtained by enforcing this property on 
B. However, the existing algorithms do not enforce hull-consistency directly in the 
above constraint, but on the decomposed set of primitives XiXXj=0 and X 2 -Xi=Xj, 
introducing the new variable x^. Because £’=<[-!.. 3], [1.. 2], [-2. .3]> is hull-consistent 
wrt the set of primitives, box B would not he narrowed by these algorithms. 

Box-consistency guarantees the consistency of each bound of the domain of each 
variable with the T’-intervals of the others. After substituting all but one variable by 
their interval domains, the problem of binding the remaining variable is tackled by 
numerical methods, in particular a combination of interval Newton iterates and 
bisection [1, 15]. The main advantage of this approach is that each constraint can be 
manipulated as a whole, not requiring the decomposition into primitive constraints, 
thus preventing the amplification of the locality problem. In the previous example the 
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box fi=<[-1..3],[1..2]> is not box-consistcnt wrt XiX{x2-Xi)=0 since, for example, the 
upper bound ofxi is not consistent with^2=[l--2] (Og 3x([1..2]-3)=[-6..-3]). Moreover, 
the enforcement of box-consistent on B wrt the original constraint would result in box 
A, which is the tightest box enclosure of its solution space within B. 

The domain pruning obtained by box-consistency is often insufficient. The reason 
is that if there are n uncertain variables, it is still necessary to handle functions with n- 
1 interval values. Depending on the constraint, the uncertainty of the n-\ intervals 
may allow a wide range of possible values for these functions, preventing the finding 
of inconsistency required for domain reduction. 



3.2 Higher Order Consistencies 

Better pruning of the variable domains may be achieved if, complementary to a local 
property, some (global) properties arc also enforced on the overall constraint set. 

3B -consistency [9] and Bound-consistency [13] arc higher order generalizations of 
hull and box-consistcncy respectively. In both, the property enforced on the overall 
constraint set is the following: if the domain of one variable is reduced to one of its 
bounds then this simplified CCSP must be local consistent (hull or box-consistent). 

The algorithms to enforce these stronger consistencies are higher order propagation 
algorithms where constraint propagation is just a procedure that may be interleaved 
with search techniques. The price to pay for a stronger consistency requirement is the 
growth in the computational cost of the enforcing algorithm. The adequacy of a 
consistency criterion for a particular CCSP must be evaluated taking into account the 
tradc-off between the pruning it achieves and its execution time. Moreover one must 
be aware that the filtering process is performed within a larger procedure for solving 
the CCSP and may be globally advantageous to obtain faster, if less accurate, results. 



3.3 Octree Approach 

An alternative approach to the interval constraints framework for solving CCSPs was 
proposed in [8] . The key idea is to compute the complete solution space of a CCSP 
through the propagation of total constraints represented by a set of partitions of the 
feasible space called 2*^- trees. A total constraint is a relation defining consistent value 
combinations among a set of variables, this relation summarizes all the constraints in 
the CCSP between the variables. Any CCSP can be represented by a set composed of 
ternary and binary total constraints. A 2'‘-tree is a hierarchical decomposition of the 
solution space defined in the total constraint. The partition is achieved by recursively 
halving simultaneously each variable domain within a region that is not clearly within 
the feasible region. A maximum number of successive partitions is defined 
accordingly to a required precision. This precision, may be, in certain types of 
problems, increased to obtain better approximations of the solution space. 

This partitioning approach is specially suited for dealing with CCSPs where all the 
relations determine convex regions. When regions arc not convex, they must be 
decomposed into convex sub-regions, which implies the a priori definition of the 
required precision and can cause an explosion of disjoint solution regions. 
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4 Global Hull-Consistency 

A stronger consistency, called global hull-consistency, was proposed in [4, 6] and 
applied to solve constraint problems with parametric ordinary differential equations. 
The key idea is to generalize local hull-consistency criterion to a higher level where 
the set of all constraints is seen as a single global constraint. Hence, it must guarantee 
arc-consistcncy at the hounds of the variable domains for this single global constraint. 
If a variable is instantiated with the value of one of its hounds then there must be a 
consistent instantiation of the other variables and this complete instantiation is a 
solution of the CCSP. 

Again, due to the limitations of real value representation, the result of enforcing 
global hull-consistcncy on a box of domains must be an T’-box enclosing the real box 
that would be obtained by enforcing the property. Because within a canonical solution 
there might be a real solution of the CCSP, the best thing that can be done is to 
guarantee that for each hound of each variable there is a canonical T’-hox instantiation 
which is a canonical solution. Hence, any strategy to enforce global hull-consistcncy 
must he able to localise the canonical solutions within a box of domains that arc 
extreme with respect to each bound of each variable domain. 



4.1 Enforcing Global Hull-Consistency with Existing Techniques 

In the following we will consider three different approaches that can be easily 
implemented by existing constraint systems without significant modifications of their 
propagation mechanisms. We assume that there is a solving mechanism (alternating 
pruning and branching steps) implementing a backtracking search for obtaining 
canonical solutions. The pruning is achieved by enforcing a local consistency (hull or 
box-consistency). 

The first approach (SOLVE]) is a branch and hound algorithm where the extreme 
canonical solutions with respect to each hound of each variable domain are searched 
separately. To search for the extreme canonical solution with respect to the lower 
(upper) hound of the variable Xj the following algorithm is used: 

(i) the solving mechanism is used to obtain a canonical solution of the CCSP; 

(ii) if no canonical solution is found then the CCSP has no solutions; 

(iii) if a new canonical solution </y,. [a.. Z)],/,+y,. is found, the search space 

is pruned by imposing Xi<a (xi>b) and the solving proceeds by backtracking; 

(iv) when the solving mechanism proves that there arc no more canonical solutions, 
the last one found is guaranteed to be the extreme canonical solution with 
respect to the lower (upper) hound of the variable Xj 

The second approach (SOLVE 2 ), suggested by Frederic Benhamou (personal 
communication), controls the branching strategy of the solving mechanism to direct 
the search towards the extreme canonical solutions. Again the search is separately 
executed for each bound of each variable domain. To search for the extreme canonical 
solution wrt the lower (upper) bound of the variable Xj the strategy is: 

(i) if the branching step is applied to a box where /, is not canonical 

then obtain two T’-boxes by splitting around its mid value. 
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(ii) always search first the branch with smaller (larger) Xj values (the other branch is 
only searched by backtracking); 

(iii) if no canonical solution is found then the CCSP has no solutions; 

(iv) if a canonical solution is found then it is guaranteed to be the extreme canonical 
solution wrt to the lower (upper) bound of the variable Xj; 

The third approach (SOLVE3) is a branch and bound algorithm where the extreme 
canonical solutions with respect to each hound of each variable domain are searched 
simultaneously within a round robin scheme. The basic idea is to keep track of an 
outer box that must include all possible canonical solutions and an inner box that is 
the smallest box enclosing all the currently found canonical solutions. Initially the 
solving mechanism is applied to the original domain box to obtain a first canonical 
solution. If no canonical solution is found then the CCSP has no solutions. Otherwise 
the inner box is initialised to the obtained canonical solution and the outer box is 
initialised to the original domain box. The algorithm proceeds by alternating in a 
round robin fashion the search for the extreme canonical solution wrt the lower 
(upper) hound of each variable Xj: 

(i) the solving mechanism is applied to a box obtained from the outer box by 
substituting the upper (lower) bound of its i* domain by the lower (upper) hound 
of the inner box i™ domain; 

(ii) if no canonical solution is found then the extreme canonical solution was already 
found and the lower (upper) hound of the outer box i* domain is updated to the 
respective value in the inner box; 

(iii) otherwise, the inner box is updated to include the new canonical solution; 

The algorithm stops when the inner box equals the outer box guaranteeing that this 
is the largest global hull-consistent box within the original domains. 

Enforcing global hull-consistency is a complex problem and this will be reflected in 
any algorithm to solve it. However, we argue that better alternatives may be devised 
to the above strategies. One explanation for the bad behaviour is that all these 
strategics rely on a solving mechanism developed for finding individual canonical 
solutions. This forces the strategics to repeatedly restart the search without completely 
profiting from the previous pruning effort. 



4.2 The Tree Structured Algorithm 

We propose an alternative algorithm for enforcing global hull-consistency based on a 
tree structured representation of the complete search space. During the whole process, 
the following data structures arc maintained: 

(i) a binary tree, where each node is an T’-box representing a sub-region of the 
search space that may contain solutions of the CCSP. Each parent box is the 
smallest box enclosing its two children. The current search space is the union of 
all the leaves of the binary tree. The root is the smallest T’-box enclosing the 
current search space; 

(ii) an inner box, which is the smallest box enclosing all the canonical solutions 
found; 
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(iii) an ordered list assoeiated with eaeh bound of eaeh variable. Eaeh element of 
such list is a pair (F-Box, Action) where Action is a state representing the next 
action (prune, search or split) to perform on the F-Box. The list associated with 
the lower (upper) hound of variable x, keeps track of the leaves of the binary tree 
in ascending (descending) order of their lower (upper) hounds for the Xj domain: 

The algorithm alternates prune, search and split actions performed over specific 
sub-regions (/^-hoxes) of the current search space. The pruning of an /^-box is 
achieved by enforcing a local consistency (hull or box-consistency). The search action 
is performed with the goal of finding a canonical solution within an /^-box and may he 
implemented as a simple check of an initial guess or as a more complete local search 
procedure (see next section). The split of an /^-hox is done by splitting one of its 
variable domains (the one with the largest width) at the mid point. 

The algorithm proceeds by considering, in round robin, the first element (FBAt) of 
each ordered list associated with the lower (upper) bound of each variable Xji 

(i) if FB and the inner box have the same lower (upper) hound of the i* domain 
then do nothing (the respective extreme canonical solution was already found); 

(ii) otherwise the current /^-hox to investigate is the sub-region of FB with x, values 
smaller (larger) than the respective lower (upper) hound in the inner box (if no 
canonical solution was yet found then the current /^-hox is FB): 

(iii) action A is performed over the current /^-box and its consequences must be 
propagated over the data structures maintained by the algorithm: 

(a) pruning: The search space discarded in the pruning must he removed from 
the binary tree and in particular from FB. This may imply to narrow FB, to 
eliminate FB or split FB into the new narrowed sub-region and the 
non-investigated sub-region. Changes on the leaves of the binary tree imply 
that the associated elements in the ordered lists may need to he reordered, 
eliminated or inserted. In case of a new element the information about the 
next action is initialised to pruning. Otherwise, after pruning, the next action 
is updated to searching. 

(b) search: If a canonical solution was not found then A is updated to splitting. 
Otherwise, the inner box is updated to include this new canonical solution 
and the elements of the ordered lists with sub-regions including it must reset 
their actions to pruning. 

(c) split: The binary tree must he updated with the new leaves which arc 
descendants of FB. The creation of new leaves implies the insertion of new 
elements in the ordered lists with actions initialised to pruning. 

The algorithm stops when the inner box equals the root box guaranteeing that this 
is the largest global hull-consistent box within the original domains. 

The main advantage of the proposed algorithm is the dynamic representation of the 
search space allowing the focussing on regions that are relevant for the enforcement 
of global hull-consistency without losing information obtained in the pruning process. 

Another advantage wrt the previous approaches regards the better results obtained 
by an any-time use of the algorithm. In the other strategics, the box enclosing all 
possible solutions could only he narrowed by finding a canonical solution and proving 
it to he a extreme wrt a bound of a variable. This complete proof can be extremely 
time consuming, despite some outer regions of the search space might be easily 
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proved inconsistent. The tree-structured algorithm firstly concentrates the pruning on 
the outer regions of the search space delaying the analysis of the internal more 
difficult regions. Any pruning achieved is propagated immediately along the binary 
tree, maintaining, at the root, an /^-hox enclosing the current search space, which may 
be returned at any time in the process. Moreover, this box may be compared with the 
current inner box to bind the narrowing expectations. 

Finally, the tree-structured representation may he used for further analysis, since it 
is a more detailed representation of the solution space than just an enclosing box. It 
could help to find interactively important possible sub-regions of the search space. 
Another possibility is to enforce a stronger consistency criterion based on the 
recursive enforcement of global hull-consistency on each leaf of the tree. 



5 Local Search 

An important characteristic of the tree-structured algorithm is to explore only relevant 
sub-regions of the search space that arc outside the inner box. Consequently the 
discovery of a new canonical solution reduces the relevant search space as a result of 
the enlargement of the inner box. Applying local search techniques for finding 
canonical solutions within a box may significantly enhance the algorithm efficiency. 

The key idea of local search techniques is to navigate on points of the search space 
by inspecting some local properties of the current point to choose a nearby point to 
jump to. We developed the following local search algorithm for finding canonical 
solutions of a CCSP within a search box FB\ 

(i) initially a starting point is chosen to be the current point. If the goal is to find the 
lower (upper) hound of variable x,, the point is the mid point of FB except that 
the i* domain is the smallest (largest) Xj value within the search box; 

(ii) if the current point is within a canonical solution then the algorithm stops; 

(iii) otherwise, a multidimensional vector is obtained based on the Newton’s method 
for multidimensional root finding (see next subsection); 

(iv) a minimization process (described in subsection 5.2) obtains a new point inside 
the search box and within the line segment defined by the current point and the 
point obtained by applying the multidimensional vector to the current point. 

(v) if the distance between the new point and the current point does not exceed a 
convergence threshold then the algorithm stops without finding any solution; 

(vi) otherwise the current point is updated to the new point and the algorithm 
proceeds in step (ii); 

The algorithm is guaranteed to stop since step (iv) ensures the minimization of a 
function for which any canonical solution is a zero and the convergence to a local 
minimum is detected in step (v). 



5.1 Obtaining a Multidimensional Vector 

The ultimate goal of obtaining a multidimensional vector is to find a solution by 
applying it to the current point. The idea is to reduce the problem of finding a solution 
into the problem of finding a root of a multidimensional vector function F. The vector 
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function must be associated with the set of constraints in a way that a zero of each 

2 2 

element F, must satisfy the constraint C,. For example, if the constraints are Xi+X 2 =\ 

2 2 

and xiK(x 2 -Xi)=Q then F i(xiyX 2 )=Xi+X 2 -\ and F 2 (xiyX 2 )=xiX.(x 2 -Xi). 

The Newton’s iterative method for multidimensional root finding [12] is known to 
rapidly converge to a root from a sufficiently good initial guess. The idea is to 
compute a multidimensional vector Sc (the Newton vector) which applied to current 
point X reaches a root of the multidimensional function F{x+Sc)=0. Expanding the 
function in Taylor series and neglecting the higher order terms: F{x+Sc)=F(x)+J-Sk 
(where J is the Jacobian matrix). Finally if is a root, then J Sc = -F(x). 

Solving this equation in order to Sc we obtain a estimation of the corrections to the 
current point x that move each function F,{x) closer to zero simultaneously. To solve 
the equation, it is convenient to use a numerical technique called Singular Value 
Decomposition (SVD) [14] since it can be used to solve any set of linear equations. 

The equation may have zero, one or several solutions. If there arc no solutions the 
SVD technique computes the value of Sc that minimizes the distance between the two 
sides of the equation which is the best possible correction. If there arc one or more 
solutions, the SVD technique computes the Sc that is closer to zero (minimizes 1^1) 
which is the smallest possible correction. 



5.2 Obtaining a New Point 

One problem of the Newton’s iterative method is that it may fail convergence if the 
initial guess is not good enough. To correct this we followed a globally convergent 
strategy that guarantees some progress toward the solution at each iteration. The fact 
that the Newton vector Sc defines a descent direction for IFp guarantees that it is 
always possible to obtain along that direction a point closer to a zero of F. This new 
point must lie in the segment defined by with Ae [0..1]. The strategy to obtain 

the new point consists of trying different X values, starting with the largest possible 
value without exceeding the search box limits, and backtracking to smaller values 
until a suitable point is reached. The idea is that if x is close enough to the solution 
then the Newton step has quadratic convergence. Otherwise a smaller step is taken 
still directed to a solution (or a mere local minimum), guaranteeing convergence. 

In alternative to the Newton approach, other minimization methods (as the 
Conjugate Gradient Method [14]) could have been directly applied to the scalar 
function IFp. However, the early collapsing of the various dimensions of F into a 
single one implies the lack of information about each individual constraint and makes 
these strategies more vulnerable to local minima. 



6 An Under-Constrained CCSP 

In this section we illustrate the concept of global hull-consistency with a simple 

example of an under-constrained CCSP. There are two variables Xi and Xz with real 

2 2 2 2 2 2 
values ranging within [-5. .5] and constrained by: X{+X2<2 and (.ry-1) +{X 2 -V) >2.5 . 
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Fig. 1. Local consistency and global hull-consistency on a simple CCSP. 

Fig. 1 sketches the problem. The thick solid square is the initial domain box. The two 
circumferences represent the two constraints. The grey area represents the complete 
set of solutions. The thin solid square is the box obtained by enforcing a local 
consistency, either box-consistency or hull-consistency (on the decomposed problem). 
The dashed square is the box obtained by enforcing global hull-consistency. 

The figure shows that the local consistency criterion cannot prune the search space 
inside the smaller circumference - the pruning is the same as it would be without the 
constraint associated with the larger circumference. Depending on the decision 
problem to solve, this may he irrelevant or, on the contrary, it may justify a more time 
consuming enforcement of a stronger consistency, such as global hull-consistency. 

Better pruning is obtained by applying the octree approach (see section 3.3), whose 
goal is much more ambitious, aiming at providing a compact description of the 
complete set of solutions. To achieve this goal the entire solution space is extensively 
partitioned. Fig. 2 compares the partitions obtained by this technique (with at most 6 
subdivisions) with the partitions obtained by the tree structured algorithm with local 
search for global hull-consistency (with 10'® precision limit -see next section). 





Fdg. 2. Pruning by an extensive partitioning approach and the tree structured algorithm. 
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Only 3 boxes (leaves of the binary tree, the complete tree has a total of 5 boxes) 
needed to be considered for enforcing global hull-consistency since the 4 extreme 
canonical solutions (represented in the figure as small circles) were immediately 
found. In the other approach 92 partitions had to be considered delimiting with a 
precision of 0.15625 (the size of smaller squares) the all boundary of the feasible 
region. Note that in terms of a single box enclosing the complete set of solutions this 
approach is much less precise than the global hull-consistency approach 

The above discussion justifies our belief that global hull-consistency may be a 
good compromise between less accurate local consistencies and extensive, but too 
demanding, partitioning approaches. 



6.1 Preliminary Results 

We now compare the performance of the tree-structured algorithm for enforcing 
global hull-consistency with the alternative algorithms discussed in section 4.1 
(SOLVEi, SOLVE 2 and SOLVE 3 ). Two versions of the algorithm are analysed: 
version TSAq has no local search (only an initial guess is tried) whereas version TSAjs 
implements local search as described in section 5. All algorithms use the same local 
consistency criterion (box-consistency) for pruning each box domain. An extension of 
the previous CCSP example to three dimensions was considered since the version 
with two dimensions was easily solved by several of the above approaches. Several 
different precision limits (f) were also considered for the smallest acceptable sizes of 
the domain boxes. Table I shows the results. 



Table I - Execution times for different precision limits. 





SOLVEi 


SOLVE, 


SOLVE3 


TSA„ 


TSAk 


e=l0-^ 


Imin 3.360s 


4.010s 


7.720s 


0.660s 


0.080s 


e=10'® 


> 15min 


2min 6.720s 


4min 19.220s 


17.600s 


1.700s 


e=10'* 


> 15min 


lOmin 37.560s 


> 15min 


8min 23.800s 


11.930s 



The TSA]s algorithm clearly outperformed all the other approaches. Moreover, the 
version without local search (TSAq) was distinctly inferior for solving this problem, 
despite exibiting better efficiency than the best of the SOLVEj algorithms. The 
enhancement obtained by local search is a direct consequence of the reduction on the 
domains partitioning. Table II compares the number of leaves in the binary tree 
structure at the end of both algorithms (TSAq and TSAis). 

Table n - Number of leaves in the binary tree structure for different precision limits. 





e=10'^ 


e=m^ 


e=10'* 


TSA„ 


309 


9093 


91134 


TSAb 


23 


77 


316 
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7 Conclusions 

This paper addressed constraint solving over continuous domains in the context of 
decision making, and discusses the trade-off between precision in the definition of the 
solution space and the computational efforts required. We showed that in this context, 
our proposed approach to enforce a global hull consistency may be an appropriate 
choice, achieving acceptable precision with relatively low computational cost. 

Such effort depends on the algorithms used to enforce such consistency. Among 
the set of algorithms we developed, the one that integrates local search with constraint 
propagation, within a tree-structured representation of the domain, has clearly shown 
the best performances, which makes a case to further research the integration of 
constraint propagation and local search methods within interval constraint solving. 

Although the results were obtained in a particular, and simple, example, we believe 
that this approach can scale up to more complex and realistic problems. This is clearly 
a direction for further research that we intend to proceed, namely to extend our 
previous work on continuous constraints with differential equations [5] . 
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References 

1. Benhamou, F., McAllester, D., Van Hentenryck, P.: CLP(Intervals) revisited. Logic 
Programming Symposium, MIT Press (1994) 124-131. 

2. Benhamou, F., Older, W.J.: Applying Interval Arithmetic to Real, Integer and Boolean 
Constraints. Journal of Logic Programming (1997) 32(1): 1-24. 

3. Cleary, J.G.: Logical Arithmetic. Future Computing Systems (1987) 2(2):125-149. 

4. Cruz, J., Barahona, P.: An Interval Constraint Approach to Handle Parametric ODEs for 
Decision Support. Principles Practice Constraint Programming, Springer (1999) 478-479. 

5. Cruz, 1., Barahona, P., Benhamou, F.: Integrating Deep Biomedical Models into Medical 
DSSs: an Interval Constraint Approach. A1 in Medicine, Springer (1999) 185-194. 

6. Cruz, 1., Barahona, P.: Handling Differential Equations with Constraints for Decision 
Support. Frontiers of Combining Systems, Springer (2000) 105-120. 

7. Hansen, E.: Global Optimization Using Interval Analysis. Marcel Dekker (1992). 

8. Sam-Haroud, D., Ealtings, B.V.: Consistency Techniques for Continuous Constraints. 
Constraints (1996) 1(1,2):85-118. 

9. Lhomme, O.: Consistency Techniques for Numeric CSPs. IJCAl, IEEE Pr. (1993) 232-238. 

10. Montanari U.: Networks of Constraints: Fundamental Properties and Applications to 
Picture Processing. Information Science (1974) 7(2):95-132. 

11. Moore, R.E.: Interval Analysis. Prentice-Hall (1966). 

12. Ortega, 1., Rheinholdt, W.: Iterative Solution of Nonlinear Equations in Several Variables. 
Academic Press (1970). 

13. Puget, J-E., Van Hentenryck, P.: A Constraint Satisfaction Approach to a Circuit Design 
Problem. Journal of Global Optimization, MIT Press (1997). 

14. Stoer, J., Burlisch R.: Introduction to Numerical Analysis. Springer Verlag (1980). 

15. Van Hentenryck, P., McAllester, D., Kapur, D.: Solving Polynomial Systems Using a 
Branch and Prune Approach. SIAM Journal of Numerical Analysis (1997) 34(2). 




Towards Provably Complete Stochastic 
Search Algorithms for Satisfiability 



Ines Lynce, Luis Baptista, and Joao Marques-Silva 



Department of Informatics, 
Technical University of Lisbon, 
IST/INESC/CEL, Lisbon, Portugal 
{ines ,lmtb, jpms}@sat . inesc.pt 



Abstract. This paper proposes a stochastic, and complete, backtrack 
search algorithm for Propositional Satisfiability (SAT). In recent years, 
randomization has become pervasive in SAT algorithms. Incomplete al- 
gorithms for SAT, for example the ones based on local search, often re- 
sort to randomization. Complete algorithms also resort to randomization. 
These include, state-of-the-art backtrack search SAT algorithms that of- 
ten randomize variable selection heuristics. Moreover, it is plain that the 
introduction of randomization in other components of backtrack search 
SAT algorithms can potentially yield new competitive search strategies. 
As a result, we propose a stochastic backtrack search algorithm for SAT, 
that randomizes both the variable selection and the backtrack steps of 
the algorithm. In addition, we describe and compare different organiza- 
tions of stochastic backtrack search. Finally, experimental results provide 
empirical evidence that the new search algorithm for SAT results in a 
very competitive approach for solving hard real-world instances. 



1 Introduction 

Propositional Satisfiability is a well-known NP-complete problem, with extensive 
applications in Artificial Intelligence, Electronic Design Automation, and many 
other fields of Computer Science and Engineering. 

In recent years, several competitive solution strategies for SAT have been 
proposed and thoroughly investigated [10,9,11]. Advanced techniques applied 
to backtrack search algorithms for SAT have achieved remarkable improve- 
ments [3,7,9,11,12,15], having been shown to be crucial for solving large instances 
of SAT derived from real-world applications. Current state-of-the-art SAT solvers 
incorporate advanced pruning techniques as well as new strategies on how to or- 
ganize the search. Effective search pruning techniques include, among others, 
clause recording and non-chronological backtracking [3,9,11], whereas recent ef- 
fective strategies include search restart strategies [7]. Moreover, the work of S. 
Prestwich [12] (inspired by the previous work of others [6,13]) has motivated the 
utilization of randomly picked backtrack points in incomplete SAT algorithms. 
More recently, a stochastic systematic search algorithm has been proposed [8]. 
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The remainder of this paper is organized as follows. Section 2 briefly sur- 
veys SAT algorithms and the utilization of randomization in SAT. Afterwards, 
Section 3 introduces a stochastic backtrack search SAT algorithm and the next 
section details randomized backtracking. Preliminary experimental results are 
presented and analyzed in Section 5. Finally, we conclude and suggest future 
research work in Section 6. 

2 SAT Algorithms 

Over the years a large number of algorithms have been proposed for SAT, 
from the original Davis-Putnam procedure [5], to recent backtrack search al- 
gorithms [3,9,11,15], among many others. 

SAT algorithms can be characterized as being either complete or incomplete. 
Complete algorithms can establish unsatisflability if given enough CPU time; 
incomplete algorithms cannot. In a search context complete and incomplete al- 
gorithms are often referred to as systematic, whereas incomplete algorithms are 
referred to as non-systematic. 

The vast majority of backtrack search SAT algorithms build upon the original 
backtrack search algorithm of Davis, Logemann and Loveland [4]. A generic 
organization of backtrack search for SAT considers three main engines: 

— The decision engine (Decide) which selects an elective variable assignment 
each time it is called. 

— The deduction engine (Deduce) which applies Boolean Constraint Propaga- 
tion, given the current variable assignments and the most recent decision 
assignment, for satisfying the CNF formula. 

— The diagnosis engine (Diagnose) which identifies the causes of a given con- 
flicting partial assignment. 

Recent state-of-the-art backtrack search SAT solvers [3,9,11,15] utilize sophis- 
ticated variable selection heuristics, implement fast Boolean Constraint Propaga- 
tion procedures, and incorporate techniques for diagnosing conflicting conditions, 
thus being able to backtrack non-chronologically and record clauses that explain 
and prevent identified conflicting conditions. Clauses that are recorded due to 
diagnosing conflicting conditions are referred to as conflict-induced clauses (or 
simply conflict clauses). 

3 Stochastic Systematic Search 

In this section we describe how randomization can be used within backtrack 
search algorithms to yield a stochastic systematic search SAT algorithm. 

As previously explained in Section 2, a backtrack search algorithm can be 
organized according to three main engines: the decision engine, the deduction 
engine and the diagnosis engine. Given this organization, we define a backtrack 
search (and so systematic) SAT algorithm to be stochastic provided all three 
engines are subject to randomization: 
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1. Randomization can be (and has actually been [2,3,11]) applied to the decision 
engine by randomizing the variable selection heuristic. 

2. Randomization can be applied to the deduction engine by randomly picking 
the order in which implied variable assignments are handled during Boolean 
Constraint Propagation. 

3. The diagnosis engine can be randomized by randomly selecting the point to 
backtrack to. 

For the deduction engine, randomization only affects the order in which as- 
signments are implied, and hence can only affect which conflicting clause is 
identified first, and so it is not clear whether randomization of the deduction 
engine can play a significant role. As a result, we chose to randomize the two 
other engines of the backtrack search SAT algorithm. 

Since the randomization of the decision engine is simply obtained by ran- 
domizing the variable selection heuristic [2,3,11], in the next section we focus on 
the randomization of the diagnosis engine. 



4 Randomized Backtracking 

State-of-the-art SAT solvers currently utilize different forms of non-chronological 
backtracking, for which each identified conflict is analyzed, its causes identified, 
and a new clause created and added to the CNF formula. Created clauses are 
then used to compute the backtrack point as the most recent decision assignment 
from all the decision assignments represented in the recorded clause. 

The diagnosis engine of a non-chronological backtrack search algorithm can 
be randomized by randomly selecting the point to backtrack to. The conflict 
clause is then used for randomly deciding which decision assignment is to be 
toggled. This form of backtracking is referred to as random backtracking. 

In SAT solvers implementing non-chronological backtracking and clause re- 
cording, even with opportunistic clause deletion, the algorithms are guaranteed 
to be complete, because there is always an implicit explanation for why a solution 
cannot be found in the portion of the search space already searched. However, 
in order to relax this backtracking condition and still ensure completeness, ran- 
domized backtracking requires that all recorded clauses must be kept in the CNF 
formula. 

Moreover, there exists some freedom on how the backtrack step to the target 
decision assignment variable is performed and on when it is applied. For example, 
one can decide not to apply randomized backtracking after every conflict but 
instead only once after every K conflicts. 



4.1 Completeness Issues 

With randomized backtracking, clause deletion may cause already visited por- 
tions of the search space to be visited again. A simple solution to this problem is 
to prevent deletion of recorded clauses, i.e. no recorded conflict clauses are ever 
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deleted. If no conflict clauses are deleted, then conflicts cannot be repeated, and 
the backtrack search algorithm is necessarily complete. The main drawback of 
keeping all recorded clauses is that the growth of the CNF formula is linear in 
the number of explored nodes, and so exponential in the number of variables. 
However, as will be described in Section 4.2, there are effective techniques to 
tackle the potential exponential growth of the CNF formula. Moreover, experi- 
mental data from Section 5 clearly indicate that the growth of the CNF formula 
is not exponential in practice. 

It is important to observe that there are other approaches to ensure com- 
pleteness that do not necessarily keep all recorded conflict clauses: 

1. One solution is to increase the value of K each time a randomized backtrack 
step is taken. 

2. Another solution is to increase the relevance-based learning [3] threshold 
each time a randomized backtrack step is taken (i.e. after K conflicts). 

3. One final solution is to increase the size of recorded conflict clauses each 
time a randomized backtrack step is taken. 

Observe that all of these alternative approaches guarantee that the search 
algorithm is eventually provided with enough space and/or time to either iden- 
tify a solution or prove unsatisflability. However, all strategies exhibit a key 
drawback: paths in the search tree can he visited more than once. Moreover, even 
when recording of conflict clauses is used, as in [9,11], clauses can eventually be 
deleted and so search paths may be re-visited. 

We should note that, as stated earlier in this section, if all recorded clauses 
are kept, then no conflict can be repeated during the search, and so no search 
paths can be repeated. Hence, as long as the search algorithm keeps all recorded 
conflict clauses, no search paths are ever repeated. 

4.2 Implementation Issues 

After (randomly) selecting a backtrack point, the actual backtrack step can be 
organized in two different ways: 

— One can non- destructively toggle the target decision assignment, meaning 
that all other decision assignments are unaffected. 

— One can destructively toggle the target decision assignment, meaning that 
all of the more recent decision assignments are erased. 

The two randomized backtracking approaches differ significantly. Destruc- 
tive randomized backtracking is more drastic and attempts to rapidly cause the 
search to explore other portions of the search space. Non-destructive randomized 
backtracking has characteristics of local search, in which the current (partial) 
assignment is only locally modified. 

Another significant implementation issue is memory growth. Despite the 
growth of the number of clauses being linear in the number of searched nodes, for 
some problem instances a large number of backtracks will be required. However, 
there are effective techniques to tackle the potential exponential growth of the 
CNF formula. Next we describe two of these techniques: 
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1. The first technique for tackling CNF formula growth is to opportunistically 
apply subsumption to recorded conflict clauses. This technique is guaran- 
teed to effectively reduce the number of clauses that are kept in between 
randomized backtracks. 

2. Alternatively, a second technique consists of just keeping recorded conflict 
clauses that explain why each sub-tree, searched in between randomized 
backtracks, does not contain a solution. This process is referred to as iden- 
tifying the tree signature [1] of the searched sub-tree. 

Regarding the utilization of tree signatures, observe that it is always possible 
to characterize a tree signature for a given sub-tree Tg that has just been searched 
by the algorithm. Each time, after a conflict is identified and a randomized 
backtrack step is to be taken, the algorithm defines a path in the search tree. 
Clearly, the explanation for the current conflict, as well as the explanations for 
all of the conflicts in the search path, provide a sujjieient explanation of why 
sub-tree Tg, that has just been searched, does not contain a solution to the 
problem instance. 

4.3 Randomized Backtracking and Search Restart Strategies 

It is interesting to observe that randomized backtracking strategies can be inter- 
preted as a generalization of search restart strategies. The latter always start the 
search process from the root of the search tree, whereas the former randomly se- 
lect the point in the search tree from which the search is to be restarted (assuming 
destructive backtracking is used). Moreover, observe that both approaches im- 
pose the same requirements in terms of completeness, and that the alternative 
techniques for completeness described in Section 4.1 for random backtracking 
also apply to search restart strategies. 

It is also interesting to observe that the two strategies can be used together. 
In this case, each strategy Srb (for randomized backtracking) or Srst (for restarts) 
is applied after every Krt or Kj-st conflicts, respectively. In general we assume 
Krb < Krst, since Srst causes the search to explore new portions of the search 
space that differ more drastically from those explored by Srb- 

5 Experimental Results 

This section presents and analyzes experimental results that evaluate the ef- 
fectiveness of the techniques proposed in this paper in solving hard real-world 
problem instances. Recent examples of such instances are the superscalar pro- 
cessor verification problem instances developed by M. Velev and R. Bryant [14]. 
We consider four sets of instances: sssl.Oa with 9 satisfiable instances, sssl.O 
with 40 selected satisfiable instances, sss2.0 with 100 satisfiable instances, and 
sss-sat-1.0 with 100 satisfiable instances. For all the experimental results pre- 
sented in this section a PHI @ 866MHz Linux machine with 512 MByte of RAM 
was used. The CPU time limit for each instance was set to 200 seconds, except 
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for the sss-sat-1.0 instances for which it was set to 1000 seconds. Since random- 
ization was used, the number of runs was set to 10 (due to the large number of 
problem instances being solved). Moreover, the results shown correspond to the 
median values for all the runs. 

In order to analyze the different techniques, a new SAT solver — QuestO.5 
— has been implemented. QuestO.5 is built on top of the GRASP SAT solver [9], 
but incorporates restarts as well as random backtracking. Random backtracking 
is applied non-destructively after every K backtracks Furthermore, in what 
concerns implementation issues (see section 4.2), the backtracking point is se- 
lected from the union of the recorded conflict clauses in the most recent K 
conflicts and the tree signature of each sub-tree is kept in between randomized 
backtracks. 

Moreover, for the experimental results, a few configurations were selected: 

— RstlOOO+inclOO indicates that restarts are applied after every 1000 back- 
tracks (i.e. the initial cutoff value is 1000), and the increment to the cutoff 
value after each restart is 100 backtracks. (Observe that this increment is 
necessary to ensure completeness.) 

— RstlOOO+ts configuration also applies restarts after every 1000 backtracks 
and keeps the clauses that define the tree signature when the search is 
restarted. Moreover, the cutoff value used is 1000, being kept fixed, since 
completeness is guaranteed. 

— RBI indicates that random backtracking is taken at every backtrack step; 

— RBIO applies random backtracking after every 10 backtracks; 
RstlOOO+RBl means that random backtracking is taken at every back- 
track and that restarts are applied after every 1000 backtracks. (The identi- 
fication of the tree signature is used for both randomized backtracking and 
for search restarts.) 

RstlOOO+RBlO means that random backtracking is taken after every 10 
backtracks and also that restarts are applied after every 1000 backtracks. 
(The identification of the tree signature is used for both randomized back- 
tracking and for search restarts.) 

The results for QuestO.5 on the SSS instances are shown in Table 1. In this 
table. Time denotes the CPU time. Nodes the number of decision nodes, and 
X the average number of aborted problem instances. As can be observed, the 
results for QuestO.5 reveal interesting trends: 

— Random backtracking taken at every backtrack step allows significant reduc- 
tions in the number of decision nodes. 

— The elimination of repeated search paths in restarts, when based on identi- 
fying the tree signatures and when compared with the use of an increasing 
cutoff value, helps reducing the total number of nodes and CPU time. 

— The best results are always obtained when random backtracking is used, 
independently of being or not used together with restarts. 

^ For QuestO.5 we chose to use the number of backtracks instead of the number of 
conflicts, original GRASP code is organized [9]. 




Towards Provably Complete Stochastic Search Algorithms for Satisfiability 369 



Table 1. Results for the SSS instances 



Inst 


sssl.Oa 


sssl.O 


sss2.0 


sss-sat-1.0 


Quest 0.5 


Time 


Nodes 


X 


Time 


Nodes 


X 


Time 


Nodes 


X 


Time 


Nodes 


X 


RstlOOO+inclOO 
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59511 


“o 
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188798 


0 


1412 


494049 


0 


50512 


8963643 


39 


RstlOOO+ts 
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52850 


0 


345 


143735 


0 


1111 


420717 


0 


47334 


7692906 


28 


RBI 


79 


11623 


“o 


231 


29677 


0 


313 


31718 


0 


10307 


371277 


1 


RBIO 
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43609 


If 


278 


81882 


0 


464 


118150 


0 


6807 


971446 


1 


Rstl000-I-RB1 


79 


11623 


“o 


221 


28635 


0 


313 


31718 


0 


10330 


396551 


2 


RstlOOO-l-RBlO 


84 


24538 




147 


56119 


0 


343 


98515 


0 


7747 


1141575 


0 


GRASP 


1603 


257126 


“8 


2242 


562178 


11 


13298 


3602026 


65 


00 

CO 

o 

CO 

o 


12587264 


82 



RstlOOO+RBlO is the only configuration able to solve all the instances in 
the allowed CPU time for all runs. 

The experimental results reveal additional interesting patterns. When com- 
pared with the results for GRASP, Quest 0.5 yields dramatic improvements. Fur- 
thermore, even though the utilization of restarts reduces the amount of search, 
it is also clear that more significant reductions can be achieved with random- 
ized backtracking. In addition, the integrated utilization of search restarts and 
randomized backtracking allows obtaining the best results, thus motivating the 
utilization of multiple search strategies in backtrack search SAT algorithms. 



6 Conclusions and Future Work 

This paper proposes and analyzes the application of randomization in the dif- 
ferent components of backtrack search SAT algorithms. A new, stochastic but 
complete, backtrack search algorithm for SAT is proposed. 

In conclusion, the contributions of this paper can be summarized as follows: 

1. A new backtrack search SAT algorithm is proposed, that randomizes the 
variable selection and the backtrack steps. 

2. The proposed SAT algorithm is shown to be complete, and different ap- 
proaches for ensuring completeness are described. 

3. Randomized backtracking is shown to be a generalization of search restart 
strategies, and their joint utilization is proposed. 

4. Experimental results clearly indicate that significant savings in search effort 
can be obtained for different organizations of the proposed algorithm. 

In the near future, we expect to consider other variations of this new al- 
gorithm. We can envision establishing a generic framework for implementing 
backtracking strategies, allowing the implementation of different hybrids, all 
guaranteed to be complete and so capable of proving unsatisfiability. 
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Abstract. In this paper we present a three-phase structure algorithm 
devised within a constraint programming paradigm to solve real-life 
single-track railway scheduling instances of problems. The combination 
of a hill-climbing, easing the process of finding iteratiyely improved so- 
lutions, ^uld a br£inch-and-bound strategies allows us to solve 21 real-life 
problems in a reasonable time, 19 of them to optimality. 

In addition, this paper discusses a group of practical constraints, incor- 
porated into the software, that arise in real-life problems to which little 
attention has hitherto been paid. Results comparing the gain on using 
this approach on large instances problems are also presented. 

1 Introduction 

The railway services have increasingly become more competitive and so they 
demand more new features in the planning process. Hence, a tool to help the 
planner to meet changes in passenger demand is desirable. 

This work is concerned with the single-track railway scheduling problem 
where trains are only allowed to pass one another at stations, sidings and double- 
track sections. In this paper both stations, sidings and double-track sections will 
be named passing points and, therefore, will be considered as similar entities for 
the model proposed in Section 2. The single-track sections of the line, between 
passing points, are divided into track segments, and at most one train can be on 
any track segment at any time. 

Unlike a track segment, a passing point has a specified limit > 1 on the 
number of trains it can hold at any one time. A conflict can occur when two or 
more train services are planned to use the same track segment at the same time. 
Likewise, at a passing point a conflict occurs if its capacity is exceeded. 

In order to resolve a conflict, one of the conflicting train must be delayed at 
that track section or passing point. 

We present in this paper a model which maps this problem into a special 
case of job-shop scheduling problem. The objective function is to minimize the 
total tardiness of the trips [8j. 

* The author is a PhD student funded by CAPES under Grant BEX-1162/96-9. 
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A hill-climbing procedure is devised which uses the critical path structure of 
a solution of the problem and iteratively improve the solution by swapping pairs 
of critical tasks. This local search procedure is helped by a branch-and-bound 
algorithm when is trapped in a local minimum. 

The single-line track railway scheduling problem has been tackled by mod- 
elling it as a Mathematical Integer Programming (MIP) [5, 3] problem. Consid- 
ering the NP-completeness of this problem [2], some Meta- heuristic approaches 
have also been proposed [2]. 

The aim of previous authors were basically to resolve the conflicts between 
trains arising from a given desirable timetable [6]. However, there are other con- 
straints which train-operating companies would like to see also incorporated into 
a tool. This would provide them with the possibility of offering better planning 
services. Therefore, this tool must be capable of both resolving conflicts and 
satisfying the additional constraints. 

The aim of this work is to present a Constraint Programming (CP) approach 
to this problem and also four practical situations which demand more sophis- 
ticated constraints than just the avoiding of conflicts. These sort of constraints 
have hitherto been ignored in the literature [7]. 

The first of them, which is very useful in practical situations, allows the 
planner to specify a meeting station for a pair of trains and a minimum dwell 
time during which they must be together at the station, for changing either crew 
or goods between these two trains. The second provides the dispatcher with the 
possibility of specifying that a particular vehicle must be used to form another 
trip. In this case, the second trip must necessarily be scheduled after the first 
trip ends, plus the minimum specified time. The third is the blocking of a track 
segment for a certain period of time for maintenance. Finally, the planner can 
also specify a time headway limit. This limit specifies a minimum time which 
trains must stay apart from each other during the trip either for safety or any 
other operational reasons. These constraints can be taken into account in the 
current implementation. 

The numerical results show that the CP approach proposed here is a promis- 
ing alternative to MIP for scheduling real-world instances of single-track railway 
problem. 

This paper is organized as follows. In Section 2 a model is presented which 
maps the single-track railway scheduling problem into a special case of Job-Shop 
Scheduling problem. 

An outline of the algorithm used to perform the experiments is given in 
Section 3, and in Section 4 the results of solving 21 problems gathered from 
Higgins’ work [3] is presented. The conclusions are given in Section 5. 

2 The Disjunctive Graph Model 

The single-track railway can be modelled as a special case of the jobshop schedul- 
ing problem. This can be achieved by considering the train trips as jobs, which 
will be scheduled on tracks regarded as machines. A train trip may consist of 
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many operations that require traversing from one point to another on a track. 
Each of these distinct points can be a station or a signal placed along the track 
separating track segments. In order to eliminate a conflict an operation is chosen 
from the pair of conflicting operations to be delayed up to the point where the 
conflict is resolved. 

The job-shop scheduling problem is a class of combinatorial problem well 
known in the OR and AI literature and defined as follows. Given are a set of 
jobs J. For each job Ji ^ J & set of operations Jj = {on,Oi 2 , ■ ■ ■ ,Oik} is also 
specified. Each of these tasks requires processing on a unique machine rj € 72-, 
the set of machines. A machine r, has capacity > 1 to perform more than one 
task simultaneously when representing a passing point, and capacity = 1 when 
representing a track segment. 

The pij processing time for each operation oy is also given as input for the 
problem. In addition, each job has its release date di and its expected completion 
date Cj. If Ci denotes the actual completion time for job Jj, then the tardiness 
Ti of job Ji is defined as max{Ci — Cj, 0). The aim is then to minimize the total 
tardiness given by the expression; 

D=Y,Tu ( 1 ) 

Ji€j 

This problem can elegantly be represented by a disjunctive graph [8]. Con- 
sider a directed graph G = (N,A,B), where N is the set of all the processing 
operations Oij of fixed processing time pij of each Ji to be uninterrupted per- 
formed. A is a set of conjunctive arcs which represent the sequence of operations 
on a particular job, whereas B is the set of pairs of disjunctive arcs linking tasks 
being performed on the same machine. 

The conjunctive arcs of A are in fact precedence constraints so that, for each 
pair of successive operations (ojj,Oj(j+i)) of job Ji, Oij must be performed before 
■ 

Each machine which represents a track segment can only perform a unique 
operation at a time, therefore, every operation which is performed on the same 
machine is linked by two directed arcs in the set B. The semantic of these 
disjunctive arcs is so that one arc is to set the possible precedence order between 
two pair of tasks as (oij-^Oi'j'), and another arc is to set the alternative order 
(oi>j'->Oij). The two alternative orders are set this way until an order between 
the tasks on the machine is determined. 

The conjunctive arcs (oij , Oi(j^i^) E A linking two consecutive operations of 
job Ji have associated to them the processing time pij, that is the processing 
time of task o^. Associated to the arc linking two disjunctive operations, for 
instance {oij,Oi>f), is Pij and associated to the alternative arc linking the same 
pair of task {oi'j',Oij) is Pi'y, in case is instead performed before oy on the 
machine. 

To the graph G we add a dummy node S as source node. This node repre- 
sents a dummy task and links to the first task of each job. The processing time 
associated to the arc between these two tasks is di, the release time of job Jj. 
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Besides the set of conjunctive and disjunctive arcs which exist due to the 
characteristics of the problem, we may also have in the graph G some other 
fixed additional arcs introduced by the special constraints described in Section 
1 . 

We have n =| J | dummy nodes Ni to represent the sink nodes. Each of 
these nodes denote the tardiness of job J*. The reason for having n sink nodes 
rather than one is because we are minimizing the total tardiness instead of a 
unique maximum completion time, as is usual when the objective function is the 
makespan. Moreover, a change in the order of an operation on a machine can 
also cause a change in completion time of the other jobs. 

In order to have a feasible schedule it is necessary to transform the given 
disjunctive graph of the given timetable into an acyclic graph. This is performed 
by choosing an order to those disjunctive activities. This means, in train context, 
to choose which service is going to use first the track segment in dispute. 

Any operation whose the execution time coincides either total or in part 
with another operation Oi'ji being performed on the same track segment, is 
considered conflicting tasks. To arbitrate an order between any conflicting pair 
of operations we need to use a dispatch rule to decide which stretch of trip 
(operation) uses first that track segment, while the other trip is delayed. This 
way only one of the disjunctive axe is left in the disjunctive graph. The order of 
unconflicting activities is preserved because they do not have any impact on the 
total delay (see Equation 1) we want to minimize. 

Whatever the criteria chosen to arbitrate the order for the conflicting activ- 
ities is, once the unique time interval for execution of all the operations in the 
graph are determined, some operations will have an important role in the graph 
structure: they will be the critical operations. The critical path is formed by these 
particular operations. Given a feasible solution, the activities whose schedule or- 
der cannot be changed without also changing the value of the objective function 
[8] are those of the critical path. 

A neighbour feasible solution can be obtained by reversing the orientation 
of an arc between two critical activites being performed on the same machine. 
In train scheduling context, this means opting a different ordering between two 
conflicting stretch of trips. 

In following section we describe the three-stage algorithm we are proposing. 

3 The Three-Stage Algorithm 

Our algorithm consists of three stages and is developed with the Ilog Scheduler 
[4] . In the first stage there is an one-step branch-and-bound just to find an initial 
solution for the problem. In the second stage there is a local search procedure 
which uses the critical path of the solution in order to find an improved feasible 
neighbour. In the third stage there is the branch-and-bound used for the first 
stage. However, given the current upper-bound, the aim in this stage is to free 
the local search from a local minimum cost solution by finding an improved 
solution, if there is one. 
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Therefore, the algorithm described in this paper takes advantage of the best 
of the local search strategies and the enumerative strategies. Prom the first for 
walking quickly through improved solutions finding tighter bounds for the prob- 
lem, and from the second for thoroughly searching the search space seeking for 
the optimal solution. 

The devised branch-and-bound algorithm iterates chronologically through 
the list of conflicting operations. For each conflicting operations it fixes an arc 
orientation between them so that we can have an acyclic graph (see Section 
2) after resolving all the conflicting operations. A solution is found when the 
algorithm reaches a leaf of the search tree, at this point the cost of the found 
solution is posted as a new upper-bound for the next solution. 

To decide the order between a pair of conflicting operations we use the Short- 
est Processing Time (SPT) dispatch rule [8]. The SPT rule chooses the order 
between two tasks which yields the least additional total delay. 

During the process of resolving conflicting operations new conflicts can be 
created. These new conflicting operations do not ever precede in time the already 
resolved conflicts. 

The local search we propose here is a hill-climbing strategy based upon the 
introduction of a perturbation to the critical path of a solution. A perturbation 
to the critical path (see description in Section 2) is provoked by changing the 
orientation of the arc of a former pair of conflicting operations. This perturbation 
is carried out to generate a feasible neighbour until a better solution than the 
current solution is found. 

Many variants of local search have been proposed in the last decade for deal- 
ing with the job-shop scheduling problem [9]. Usually a local search strategy is 
used as an alternative to enumerative algorithms due to computational high cost 
of the complete search. Unlike this usual approach Baptiste et al. [1] presented 
a hybrid approximation and enumerative algorithm to speed up the process of 
reaching near-optimal solutions. 

The structural difference between their approach and ours is that their local 
search works up to a certain number of iterations without improvement and 
then stops. The enumerative algorithm is then launched to search for the optimal 
solution. Our branch-and-bound algorithm is instead used to free the local search 
from the local minimum, if a better solution can be found, returning the control 
to the local search afterwards. 

As already mentioned, new conflicts might be created due to the modifica- 
tions carried out as described above. The algorithm deals with the new conflicts, 
using the chosen dispatch rule, by setting up an orientation between each of the 
new conflicting pair of activities. When all the remaining conflicts are resolved, 
the cost of this new solution is then compared against the current upper-bound. 
An improved solution is taken as the new current solution and its cost is posted 
as a new upper-bound for the next solution. Another neighbour is tried other- 
wise. 

The hill-climbing algorithm is trapped in a local minimum when all the neigh- 
bours have been tried without leading to any improvement. The algorithm then 
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hands the control over to the branch-and-bound algorithm to search for a better 
solution, if there is one. When an improved solution can be found, the control is 
then returned to the local search, the optimal solution has been found otherwise. 



4 Numerical Results 

The dataset used to carry out our experiment is that also used by Higgins [3], 
which is a real-world sample of problems. 



Table 1. Problems’ classification according to their number of conflicts. 



Problems 



Number 


|Total Inbound N- 


of Passing points 


N- Conflicts 


20-29 


7 


4 


6 


6-11 


40-46 


9 


4 


6 


8-14 


50-51 


25 


12 


12 


27,35 


60,61 


30 


15 


16 


62,69 



The program described in Section 3 is used here to solve a set of problems, 
whose characteristics are shown in Table 1. Within this set of problems there are 
none of the special constraints we suggested in Section 1, although our program 
can handle them [7]. Therefore, the aim here is twofold. First, to evaluate the 
performance of the proposed combined approach on solving this set of real-world 
problems. Second, to compare the performance of the combined approach against 
that obtained by the branch-and-bound alone. 

In Table 1 the Number column gives the problem number. Total is the total 
number of trains to schedule in the problem. Inbound is the number of inbound 
trains, iW of Passing points states the number of passing points which trains 
can use to pass each other; and iW of Conflicts is the range of conflicts which 
the problems of the group have. 

The problems are classified according to the number of conflicts they have in 
their given original timetable. Higgins suggested that the number of conflicts in 
a problem is a good measure of its diflSculty [3]. 

We used our combined program to solve the 21 problems described in Table 1 
and the result yielded is depicted in Table 2. These experiments were performed 
on a networked PC Pentium III. 

The First Solution columns show the time to find the first feasible solution 
to a problem and its respective delay. Likewise, the Best Solution columns show 
the result to reach the best solution for that particular problem and the overall 
minimum delay. 

The Iterative Improvement columns shows the first local minimum solution 
reached by the local search method proposed here and described in Section 3. 
The Time, in this column, is the necessary time to find the first improved local 
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Table 2. Results of solving 21 problems by using the combined approch. 



Problem 


|First Solution 


Iterative Improvement 


Best Solution 


CPU 


Time 


Delay 


Time Delay 


LM 


Time Delay 


20 


0.03 


1815 


0.37 


1775 


14 


5.22 


1224 


5.27 


21 


0.03 


253 


0.14 


210 


1 


0.14 


175 


0.15 


22 


0.03 


294 


0.08 


238 


2 


0.14 


137 


0.15 


23 


0.02 


116 


0.06 


151 


1 


0.06 


116 


0.07 


24 


0.03 


517 


0.12 


391 


2 


0.22 


349 


0.25 


25 


0.03 


410 


0.12 


285 


1 


0.12 


285 


0.14 


26 


0.02 


344 


0.07 


352 


2 


0.13 


338 


0.15 


27 


0.02 


303 


0.09 


310 


2 


0.16 


161 


0.17 


28 


0.02 


249 


0.07 


257 


1 


0.07 


249 


0.09 


29 


0.03 


409 


0.12 


321 


2 


0.23 


223 


0.24 


40 


0.02 


350 


0.09 


282 


2 


0.15 


215 


0.16 


41 


0.02 


199 


0.11 


334 


1 


0.11 


199 


0.12 


42 


0.01 


441 


0.16 


439 


4 


0.51 


388 


0.58 


43 


0.03 


716 


0.15 


690 


4 


0.83 


560 


1.03 


44 


0.01 


427 


0.14 


476 


2 


0.23 


389 


0.26 


45 


0.03 


359 


0.09 


291 


1 


0.09 


228 


0.11 


46 


0.02 


526 


0.12 


526 


1 


0.12 


526 


0.21 


50 


0.17 


805 


3.64 


565 


2 


5.98 


555 


22.85 


51 


0.21 


1445 


18.27 


810 


6 


406.13 


650 


463.78 


60 


0.90 


504 


92.88 


318 


- 


- 


- 


- 


61 


1.09 


800 


51.83 


345 


- 


- 


_l 


- 



minimum solution and the Delay shows its value. When no improved solution 
can be found, i.e. when the heuristic is trapped, from the start, into a local 
minimum, the last local minimum cost solution found is shown instead. The LM 
number shows how many times the local search got trapped in a local minimum 
and the branch-and-bound algorithm was called to free the heuristic from it. 

In the CPU column is shown the total time in seconds including the time to 
find the best solution and prove its optimality. The symbol shows when no 
solution has been found after 15 hours. For instance, the combined algorithm 
managed to find good solutions for problems 60 and 61 in about 3min. These 
solutions are not reached by the branch-and-bound when running alone up to 
15 hours. 

The results of problems 20 and 51 turn our attention to the number of times 
the iterative improvement strategy was trapped into a local minimum and the 
correspondent amount of time to reach the best solution, 5.22 and 406.13 respec- 
tively. In addition to showing the local search part of the program did not help 
much the algorithm in walking smoothly through improved solutions for these 
two problems, these results also show the poor quality of the neighbourhood on 
yielding better solutions for these particular problems. 

For problems 23, 26, 27, 28, 41 and 44, the local search was not able to find a 
better solution than the initial solution. This is shown in Iterative Improvement 
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column in Table 2 by the delay value. However, after the improved solution 
found by branch-and-bound, the local search managed to find other solutions for 
problems 26, 27, and 44, this is shown by the fact that the branch-and-bound 
was called more than once in these cases. 

Among these 21 problems, the optimal solution was promptly found for four 
problems, problems 23, 28, 41 and 46. With the exception of problem 46, the 
iterative method did not managed to find these optimal solutions from their 
respectively neighbourhood in order to meet the cost of the first solution. 

The combined algorithm proposed here did improve the performance on find- 
ing better solutions for those large problems (51, 60, and 61). However, it is still 
difficult to reach the optimal solution and prove optimality for problems 60 and 
61 within a maximum of 15 hours. 



5 Conclusions 

This paper presents a model which maps the Single-Track Railway scheduling 
problem to a Job-Shop Scheduling problem. Some practical constraints that 
arise in train operating context are described and incorporated into the solving 
program. 

The results show that CP is promising alternative to MIP for solving real-life 
instances of problems, specially because of the possibility of combining differ- 
ent strategies altogether to get the best of them. In addition, some practical 
constraints which are hard to specify in MIP, are manageable with CP. 
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Abstract. Dealing with temporality on actions presents an important 
challenge to AI planning. Unlike Graphplan-based planners which alter- 
nate levels of propositions and actions in a regular way, introducing 
temporality on actions unbalance this symmetry. This paper presents 
TPSYS, a Temporal Planning SYStem, which arises as an attempt to 
combine the ideas of Graphplan and TGP to solve temporal planning 
problems more efficiently. TPSYS is based on a three-stage process. The 
first stage, a preprocessing stage, facilitates the management of con- 
straints on dnration of actions. The second stage expands a temporal 
graph and obtains the set of temporal levels at which propositions and 
actions appear. The third stage, the plan extraction, obtains the plan 
of minimal duration by hnding a flow of actions through the temporal 
graph. The experiments show the utility of our system for dealing with 
temporal planning problems. 



1 Introduction 

In many real world planning problems, time plays a crucial role. In these prob- 
lems it is necessary to discard the assumption that actions have the same dura- 
tion. For instance, it is clear that in a logistics domain the operator fly — planie 
may be longer than load — plane, and the action fly — plane(London, Moscow) 
is longer than fly — plane(London, Paris). Hence, dealing with temporal plan- 
ning problems requires to handle a set of more complex constraints since the 
difficulty does not only lie in the process of action selection [1], but also in the 
process of selecting the right execution times for actions. In addition, the cri- 
terion for optimization changes because in temporal contexts the interest lies 
in obtaining a plan of minimal duration rather than a plan of minimal num- 
ber of actions. Therefore, finding the plan which minimizes the global duration 
becomes an important issue in temporal planning. 
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Autonomous Mobile Robots of the Universidad Politecnica de Valencia. 
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Dealing with temporality on planning has also been tackled by introduc- 
ing independent scheduling techniques in charge of doing all the reasoning on 
time and resources. Under this approach of two separated processes, the plan- 
ner obtains a plan, which is validated afterwards by the scheduler. In case the 
scheduler determines this plan is unfeasible, the planner must obtain a new plan 
and the same procedure is repeated again. Obviously, this approach entails a 
large overhead in the overall process. For that reason, a few attempts at inte- 
grating planning and scheduling are being carried out [2]. TPSYS [3] follows the 
guidelines of these latter approaches by achieving a total integration of time and 
actions in the same framework. 

This paper extends the work in [3] and aims to solve the above drawbacks. It 
builds on the work of Smith and Weld (the Temporal Graphplan algorithm, TGP, 
presented in [4]) and examines the general question of including temporality on 
actions in a Graphplan-based approach [5] by guaranteeing the plan of minimal 
duration. We present a Temporal Planning SYStem (TPSYS) which consists of 
three stages: a preprocessing stage, a temporal graph expansion stage and a plan 
extraction stage. The main features of TPSYS are: 

— It is able to handle overlapping actions of different duration and guarantees 
the optimal plan, i.e. the plan of minimal duration. 

— It defines a new classification of mutual exclusion relations: static mutexes 
which are time independent and dynamic mutexes which are time dependent 
and transitory. 

— It expands a relaxed temporal graph (from now on TG) without maintaining 
no — op actions nor delete-edges. The TG is incrementally expanded through 
temporal levels defined by the instants of time at which propositions appear. 

— It performs a plan extraction (from now on PE) stage by selecting the ap- 
propriate actions in the TG to achieve the problem goals and guaranteeing 
the plan of minimal duration. Consequently, the algorithm can also solve 
problems of the type ‘obtain a plan of duration shorter than T>max ’• 

This paper is organized as follows. Section 2 discusses some related work in 
temporal planning and briefly introduces the main differences between TGP and 
TPSYS. The three stages of TPSYS, the necessary definitions and an application 
example of the proposed system are presented in section 3. Some experimental 
results, comparing the performance of TGP and TPSYS are shown in section 4. 
Finally, the conclusions of the work and the future lines on which we are working 
now are presented in section 5. 



2 Related Work 

Despite the great effort to introduce temporal constraints and resources in plan- 
ning, none of the existing works have been largely used. Presumably, the reason 
is they do not exhibit a good performance when dealing with temporal problems 
due to the complexity these problems entail. If we focus on the last decade, one 
of the first temporal planners was 0-Plan [6] which integrates both planning and 
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scheduling processes into a single framework. Then, other planners such as ZENO 
[7] or IxTeT [8] appeared in the literature. Although IxTeT does not manage dis- 
junctive constraints, it deals with resource availability and temporal constraints 
by using a TCN (Temporal Constraint Network [9]) to represent constraints on 
time points. An alternative system for integrating planning and scheduling is 
performed in HSTS (Heuristic Scheduling Testbed System [10]) which defines an 
integrated framework to solve planning and scheduling tasks in specific domains 
of spatial missions. This system uses multi-level heuristic techniques to manage 
resources under the constraints imposed by the action schedule. The parcPLAN 
approach [1 1] manages multiple capacity resources with actions which may over- 
lap, instantiating time points in a similar way to our approach. parcPLAN behaves 
efficiently if there are enough resources but with stricter resource limitations it 
becomes more inefficient. Koehler deals with planning under resource constraints 
by means of time producer/consumer actions but without incorporating an ex- 
plicit model of time in [12]. 

TGP [4] introduces a complex mutual exclusion reasoning to handle actions 
of differing duration in a Graphplan context. TPSYS combines features of both 
Graph pi an and TGP and introduces new aspects to improve performance. While 
TGP aims to demonstrate that its mutual exclusion reasoning remains valuable in 
a temporal setting, TPSYS aims to guarantee the optimal plan for any temporal 
planning problem. 

The reasoning on conditional mutex (involving time mutex) between actions, 
propositions and between actions and propositions is managed in TGP by means 
of inequalities which get complex in some problems. In fact, the application of 
conditional mutexes imposes a set of sophisticated formulae (even with binary 
disjunctions) which may imply an intractable reasoning on large problems [4]. 
On the contrary, the reasoning process in TPSYS is simplified thanks to the 
incorporation of several improvements: 

— Static mutex relations between actions and between actions and propositions 
are calculated in a preprocessing stage because they only depend on the 
definition of the actions. This allows us to speed up the process of calculating 
the dynamic mutex relations between propositions while generating the TG, 
obtaining better execution times than TGP. 

— TPSYS uses a multi-level temporal planning graph as Graphplan where each 
level represents an instant of time. Both TPSYS and TGP build a relaxed 
graph by ignoring no — op actions and delete-edges which allows an ex- 
tremely fast graph expansion^ [13]. While in TGP actions and propositions 
are only annotated with the first level at which they appear in the planning 
graph, TPSYS annotates all different instances of actions and propositions 
produced along time. The compact encoding of TGP reduces vastly the space 
costs but it increases the complexity of the search process, which may tra- 
verse cycles in the planning graph. However, the PE in TPSYS is straight- 

^ The TG expansion represents a small percentage of the execution time in TPSYS. 
In general, hundreds of temporal levels can be generated in few seconds for many 
classical domains. 
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forward because it merely consists of obtaining the plan as an acyclic flow of 
actions throughout the TG. The experiments show that using a much more 
informed TG vastly reduces the overhead during the PE, thus obtaining 
better global execution times than TGP. 

In addition to the temporal goals which TGP is able to manage, TPSYS can 
also manage propositions of the initial situation which hold at other times than 
t = 0. 

3 Our Temporal Planning SYStem 

In TPSYS, a temporal planning problem is specified as a 4-tuple {Ts, A, T’max}, 
where Tg and Tg represent the initial and final situation respectively, A repre- 
sents the set of actions, and I?max stands for the maximal duration of the plan 
required by the user. Time is modelled by K+ and their chronological order. Each 
action is assigned a nondisjunctive positive duration, which may be different. A 
temporal proposition is represented by <p,t> where p denotes the proposition 
and t G represents the time at which p is produced. Hence, Tg and Eg are 
formed by two set of temporal propositions {<Pi,ti>//ti < I?niax}- K niust 
be noticed that propositions in Ig {Eg) can be placed (required) at any time ti 
through the execution of the plan. 

TPSYS follows three consecutive stages, as shown in Fig. 1. After the first 
stage, the second and the third stage are executed in an interleaved way until a 
plan is found or the duration of the plan exceeds I?max- 




Failure Success 



Fig. 1. Flow chart of the three stages of TPSYS 



In this section, we will make use of the action domain defined in Table 1 to 
show the behaviour of our system. Table 1 presents a description of the actions 
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Table 1. Simplified Briefcase domain: only three actions are defined to achieve the 
goal proposition at(Bl,U) 



Action 


Duration 


Precs. 


Effects 


ld(Bl,BC,H) 


5 


at(Bl,H) 

at(BC,H) 

free(BC) 


in(Bl,BC) 

^at(Bl,H) 

^free(BC) 


mv(BC,H,U) 


5 


at(BC,H) 


at(BC,U) 

^at(BC,H) 


uld(Bl,BC,U) 


2 


in(Bl,BC) 

at(BC,U) 


at(Bl,U) 

free(BC) 

^in(Bl,BC) 



(with duration) of the well-known Briefcase domain. In order to simplify the 
domain only three actions are defined, those which are necessary to transport a 
book (B1) from home (H) to university (U) by using a briefcase (BC). 



3.1 First Stage: Preprocessing 

At this preprocessing stage, TPSYS calculates the static mutual exclusions which 
will facilitate the computation of dynamic mutex relations during the second 
stage. A mutex relationship between actions is defined as follows [5]: two actions 
a and b are mutex if a deletes a precondition of b, or a and b have conflict- 
ing effects, in whose case they cannot be executed in parallel. Mutex between 
propositions appears as a consequence of mutex between actions. Thus, two 
propositions p and q are mutex if all actions for obtaining p are mutex with all 
actions for obtaining q. 

Let a and b be two actions, and let ^i, S2 G {precs(a), add — effs(a), 
del — effs(a), precs(b), add — effs(b), del — ef f s(b)}/S'i yf 82- We define 
the function coincidences(5'i, 82) as a boolean function which holds iff 3p/ p G 
S'lA pG 82- According to this, we introduce the following definitions: 



Definition 1. Static mutex between actions. Actions a and b are statically 
mutex if they cannot he executed in parallel. Although this relationship is equiv- 
alent to Graphplan’s interference, we break it down into two different types of 
mutex relationships. Hence, if two statically mutex actions are given at the same 
instant of time, these actions will have to he executed in one of the following 
ways: 

a) Consecutive actions, iff 
coincidences(add — eff s(a), del — effs(b))V 
coincidences(add — eff s(b), del — eff s(a)). 

Clearly, if a and b have conflicting effects, the correct execution order is ‘a 
after b’ or 'b after a’. In Table 1 , actions ld(Bl.BC,H) and uld(Bl,BC,U) can 
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be executed in any consecutive order because their only interdependency is the 
conflicting effect in(Bl,BC). 

b) Consecutive actions in an after order, iff 
coincidences(del — effs(a), precs(b))A 
^coincidences(precs(a), del — effs(b)). 

Clearly, if & deletes a precondition ofh but b does not delete any precondition 
of a., the correct execution order is 'a after b’. In Table 1, action mv(BC,H, U) 
must be executed after ld(Bl,BC,H) because of the proposition at(BC,H). 



Definition 2. Static ap-mutex (static action/proposition mutex). One 

action a is statically ap-mutex with a proposition p iff p G del — effs(a). For 
instance, ld(Bl,BC,H) is ap-mutex with at(Bl,H) and free(BC) in Table 1. 

3.2 Second Stage: Temporal Graph Expansion 

We adopt the same conservative model of action as in TGP, in which i) all pre- 
conditions must hold at the start of the execution of the action, ii) preconditions 
not deleted by the action itself must hold during the entire execution of the ac- 
tion, iii) effects are undefined during the execution and only guaranteed to hold 
at the end of the action, and additionally, iv) actions can start their execution 
as soon as all their preconditions hold. 

Previously to start with the process performed in the second stage, we intro- 
duce the following definitions to better understand the expansion of the temporal 
graph. 

Definition 3. Temporal graph (TG). A TG is a directed, layered graph 
with two types of nodes (proposition and action nodes) and two types of edges 
(precondition-edges and add-edges). The TG alternates proposition and action 
levels like Graphptan. Each level is labelled with a number representing the in- 
stant of time at which propositions are present and actions start their execution. 
Levels are ordered by their instant of time. This way, the algorithm can easily 
move from a level t to the next level t' > t during the TG expansion, and to 
the previous level t" < t during the plan extraction. The TG is expanded by a 
forward- chaining process which simply adds the add-effects of actions (delete- 
effects are ignored) at the proper level according to the duration of actions. 



Definition 4. Instance of an action. We define an instance of an action a as 
the triple <a,s,e> where a denotes the action and s,e S represent the time 
when the instance starts and ends executing, respectively (e = s -\- duration{a) ) . 

Definition 5. Proposition level. A proposition level P[t] is formed by the set 
of temporal propositions {<Pi,ti>/ti < t} present at time t which verify 
< Pi ,ti> G XsV3<ai , Si ,Gi>/pi e add — ef f s(ai), Gi = ti. It must be noticed 
that ‘< ’ between ti and t is used to denote pi persists in time. Gonsequently, if 
a proposition is present at P^t], it will appear at all P[t'] such that t' > t. 
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Definition 6. Dynamic mutex between temporal propositions at P[j] . 
Let {<ai,si,ti>} and {<bj,Sj,tj>} be two sets of instances of actions which 
achieve <p,ti>,<q,tj > S P[t] respectively (i.e., p € add — ef f s(ai) and q € 
add — eff s(bj)j. Temporal propositions <p,ti> and <q,tj> are dynamically 
mutex at Pyt] iff i) (3/ a € {<ai , Si ,ti>}, /3 G {<t>j > Sj ,tj>}, a and (3 over- 
lap and ii) ai and bj are statically mutex. Otherwise, temporal propositions are 
not dynamically mutex at P[t] ■ Loosely speaking, two propositions are dynamically 
mutex at P[t] if they cannot he simultaneously available at time t. Obviously, a 
dynamic mutex may expire as new levels are expanded further in the TG, and 
the involved statically mutex actions are ordered in sequence. 



Definition 7. Action level. An action level is formed by the set of in- 
stances of actions {<ai,t,ei>} which start their execution at time t because 
V<p,ti>,<q,tj>G P[t]/p,q G precs(ai), <p,ti>,<q,tj > are not dynami- 
cally mutex at . 



Proposition 1. Let (t < PmaxJ be the earliest proposition level at which 
all temporal propositions in Tg are pairwise not dynamically mutex. Under this 
assumption, no correct plan can he found before time t (i.e., the duration of a 
correct plan will never he shorter than t). 

Proof. According to Definition 6 the proof is simple: if each pair of goal proposi- 
tions are not simultaneously available until time t, the minimal duration as pos- 
sible of a plan will be t. Notice, however, that this statement does not guarantee 
that the minimal duration of a correct plan is always t because delete-effects have 
been ignored during the TG expansion and, consequently, some goal propositions 
might have been deleted. 

The second stage of TPSYS expands the TG (see Fig. 2) by alternating 
proposition and action levels through a fast forward-chaining process. The TG 
expansion differs from Graphplan in the following points: 

— No-op actions are never generated because propositions persist in time 
through proposition levels. 

— Each instance of an action <a, s , e> in A[^] does not add its add-effects into 
P[s+i] but in Thus, different proposition levels are now generated from 
a given action level. 

— The only mutual exclusion relationship checked during the TG expansion is 
the dynamic mutex relationship between temporal propositions. 

Starting at P[o] (we will suppose there exists at least one temporal proposition 
in Tg which belongs to P[o])> the algorithm moves incrementally in time (from 
one level to the next) throughout the TG generating new action and proposi- 
tion levels. At each action level the algorithm generates the entire set of 
instances of actions which start their execution because their preconditions are 
not dynamically mutex at P[tj. After generating each instance of an action, the 
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Algorithm TG expansion 
V <pi,ti> € Is dp initialize P[tj] 
t = 0 

while {t < ©max) A (p’s is not satisfied in P[tj) do 
V a =<ai,t,ei> which starts at A[t] dp 
A[t] = A[t] U a 
^[ei] = ^'[ei]Uadd — eff s(ai) 
t = next level in the TG 
endwhile 



Fig. 2. Algorithm for the TG expansion 



propositions in add — eff s are added into the proper proposition level. The TG 
expansion terminates once all temporal propositions in the final situation are 
present in P[jj and none are pairwise dynamically mutex (i.e. P's is satisfied in 
P[t])- Moreover, if f > Pmax the algorithm outputs ^Failure’ because no feasible 
plan can be found earlier than Pmax- 

The resulting TG for the domain defined in Table 1 is shown in Fig. 3. Ac- 
tion uld(Bl,BC,U) cannot start at A[5j because its preconditions in(Bl,BC) 
and at(BC,U) are dynamically mutex at P[5j and they cannot be simultane- 
ously available until P[io]- Goal proposition at(Bl,U) is achieved at P[i2] and, 
therefore, the TG expansion terminates at that level. 




P[0] A[o] P[5] A[5] P[io] P[i2] 



t=0 t=5 t=10 t=12 



Fig. 3. Temporal Graph for the Briefcase problem defined in Table 1 



A Temporal Planning System for Time-Optimal Planning 387 



3.3 Third Stage: Plan Extraction 

The third stage is a backward search process throughout the TG to extract a fea- 
sible plan. The algorithm uses two data structures PlannedActs and GoalsToSatisfy 
which are indexed by a level. PlannedActs, which is initialized empty, stores the 
instances of actions planned at each action level. GoalsToSatisfy stores the 
temporal propositions to be satisfied at each proposition level, and it is initial- 
ized by inserting all the temporal propositions in Ts- 

Assuming the PE process starts from the proposition level P[t] (that is, the 
search starts from time t, which indicates the duration of the plan to be ex- 
tracted, in the TG), where all temporal goals in Es are not dynamically mutex, 
the algorithm proceeds in the following way: 

1. If t = 0 and GoalsToSatisfy[t] ^ Xg, then fail (backtrack) — this is the base 
case for the recursive process. 

2. If GoalsToSatisfy[t] = (f> then move backwards in time (t =previous level 
in the TG) and go to step 1 to satisfy the goals at t. 

3. Extract a temporal proposition <p,<> from GoalsToSatisfy)^]. 

4. Select an instance of an action^ a =<ai,Si,ei>/p Sadd — effs(ai),Gi < t 
{backtracking point to guarantee completeness). In order to guarantee the 
correctness of the plan, a is discarded if at least one of the following condi- 
tions holds: 

— 3/3 =<bj,Sj,Gj>€ PlcUinedActs/a and (3 overlap and ai and bj are 
statically mutex. 

— 3<q,ei>S GoalsToSatisfy/ai is statically ap-mutex with q (i.e. a 
deletes q). 

If a is discarded, another instance of an action is selected by backtracking 
to step 4. Otherwise, p is satisfied and the structures PlannedActs[si] and 
GoalsToSatisfy[si] are updated with a and precs(ai) respectively. Then, 
the algorithm goes to step 2 to satisfy another (sub)goal. 

Since delete-effects have been ignored during the TG expansion, it may be not 
possible to achieve the problem goals from the proposition level P[t] ■ Thus, if the 
PE process does not find a feasible plan from the proposition level P[t], TPSYS 
backs to the second stage to continue extending more levels before executing the 
third stage again (see Fig. 1). 

Proposition 2. TPSYS is complete. 

Informally, we can show TPSYS is complete. In TPSYS, all levels at which 
propositions and actions may appear are generated during the TG expansion. 

It is clear that, if a solution plan exists for the problem, this plan will be found 
in the TG comprised from P[g] to one of the generated proposition levels and 
it cannot be present in a proposition level which does not appear in the TG. 

^ In TPSYS, all instances of actions a are execnted as late as possible, nnless this leads 
to an unfeasible plan. 
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Additionally, when the PE process searches for a correct plan, all instances 
of actions which achieve the (sub)goals are considered. Therefore, if a solution 
exists, TPSYS finds it. 

Proposition 3. TPSYS is optimal. 

Now, we will demonstrate TPSYS is optimal. Let us suppose the PE process 
finds a plan of duration t (extracted from level P[t]) represented by Vt which is 
non-optimal. If Vt is non-optimal it implies that 3T’(,of duration t' < t (extracted 
from level P[t']) which has not been found by the PE process. But, since V^, is 
a correct plan all the temporal goals in Vs would be not dynamically mutex at 
proposition level P[f], and the second stage would have terminated at time t. 
According to Proposition 2, TPSYS is complete and the PE process would have 
found the plan V '^, , which implies that level P[t] would have never been generated 
and, consequently, Vt would have never been found which is inconsistent with 
the initial supposition. Consequently, the first plan TPSYS finds is the optimal 
plan. 

Intuitively, the optimality of TPSYS is clear because it is based on Graphplan. 
Graphplan obtains the plan with the minimal number of planning steps (levels), 
because it moves throughout the planning graph and it starts the complete ex- 
traction of a plan first from the levels with minimal number of planning steps. 
TPSYS behaves in a similar way, moving throughout the TG and extracting 
the plan first from the earliest temporal levels. This way, when a feasible plan 
is found the algorithm can guarantee this plan is the plan of minimal duration 
because the level P[t] of duration t from which the process has started is the 
earliest level at which the actions of the plan satisfy the problem goals and those 
actions are not mutex. 



3.4 Application Example 

Here, we study a simple application example to illustrate the behaviour of our 
system. This abstract example shows how the proposition and action levels of 
the TG are generated. The description of the actions and their static mutex are 
shown in Table 2. Xs = {<p, 0>,<q, 10>}, so proposition q is not available until 
time 10. Vs = {<s,50>,<t,50>,<u,50>} which implies all the goals must be 
satisfied no later than 50 (I?max = 50). 

An outline with the basic information of the TG is shown in Table 3. In 
this table, we can see the proposition and action levels generated. Although 
proposition levels have been generated until H[ 5 o], the algorithm terminates the 
TG expansion at P[ 35 j because all temporal goals in Vg are not dynamically 
mutex at that level. During the PE stage, the algorithm selects the instances 
of actions which obtain these goals, then the preconditions of these actions and 
so on. The plan obtained by TPSYS is shown in Fig. 4. The optimal duration 
of the plan is 35 and therefore, if the user constrains the maximal duration to 
Vmax < 35, the algorithm will output Vailure' because no plan exists. 
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Table 2. Domain of the actions in the application example 



Action 


Duration 


Frees. 


Effects 


Static mutex 


a 


10 


p.q 


r 


d 


b 


5 


r 


s 


c 


c 


20 


p 


^s,t 


b,d 


d 


15 


r 


^p,u 


a, c 



Table 3. Outline of the TG generated by the algorithm 



Temporal level t 


Pm 


A[t] 


0 


p 


c 


10 


p.q 


a, c 


20 


p.q.r ,t 


a,b,c,d 


25 


p,q,r , s ,t 


a,b,c,d 


30 


p,q,r , s ,t 


a,b,c,d 


35 


p,q,r , s ,t ,u 


- 


40 


p,q,r , s ,t ,u 


- 


45 


p,q,r , s ,t ,u 


- 


50 


p,q,r , s ,t ,u 


- 



4 Some Experimental Results 

To date, our work has been mainly focused on the validation of our system, rather 
than on code optimization. Based on our experience, TPSYS seems quite promis- 
ing to deal with temporal planning problems. Although comparison between our 
approach and other planning systems is quite difficult because they are based 
on different algorithms, we made a comparison between TPSYS and TGP. The 
experiments were performed in a 64 Mb. memory Celeron 400 MHz. We solved 
each problem (some simple, typical benchmarks used in AlPS 1998 competition 
and some examples provided by TGP^) with TPSYS and TGP. We can see the re- 
sults in Table 4 and how the performance of TPSYS is better than TGP in these 
problems (even in all the examples provided by TGP). Moreover, TPSYS was 
able to solve problems which TGP was unable, such as sussman- anomaly and 
brief case-2b-lc (which must transport two books with only one container). 

As in parcPLAN [11], the main drawback of TPSYS appears in problems 
with limited resources, such as brief case-2b-lc. In this case it is necessary to 
plan additional actions to release the used resources and to make them available 
again. In these problems, the goals are not dynamically mutex at a proposition 
level P[(], but a valid plan is not found at time t. Unfortunately, dynamic mutex 

® The source code of TGP and the examples it provides have been obtained at 
ftp: / / ftp.cs.washington.edu / pub / ai /tgp.tgz 
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Fig. 4. Optimal plan found by TPSYS for the application example 



Table 4. Results of comparison between TPSYS and TCP (times are in milliseconds) 



Problem 


TPSYS 


TGP 


tower2 


4 


40 


towerS 


8 


241 


gripper2 


4 


40 


sussman-anomaly 


339 


- 


att-logO 


39 


40 


att-logl 


45 


200 


briefcase-2b-2c 


5 


251 


briefcase-2b-lc 


998 


- 


tgp-AB-p 


4 


50 


tgp-AB-q 


4 


60 


tgp-AB-pq 


5 


90 


tgp-AC-r 


4 


80 


tgp-AC-pr 


5 


80 


tgp-ABDE-r 


4 


70 



relationships are not always enough to determine a proper proposition level from 
which a feasible plan can be found. 



5 Conclusions and Future Work 

In this paper we have presented TPSYS, a system for dealing with temporal 
planning problems. TPSYS arises as a combination of the ideas of Graphplan and 
TGP to solve temporal planning problems, finding optimal plans and avoiding 
the complex reasoning performed in TGP. 

TPSYS contributes on a classification into static and dynamic mutual ex- 
clusion relations. This allows to perform a preprocessing stage which calculates 
static mutexes between actions and between actions and propositions to speed 
up the following stages. The second stage expands a TG with features of both 
Graphplan and TGP planning graphs. The third stage guarantees that the first 
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found plan has the minimal duration. Our experiences and the obtained results 
show the appropriateness of TPSYS for temporal planning problems. 

As commented above, TGP presents a mutual exclusion reasoning very valu- 
able for dealing with actions and propositions on a Graphplan-based approach, 
but unfortunately this complex process may make large problems become in- 
tractable. In fact, in our tests TGP was unable to solve a simple problem on the 
Briefcase domain with two books to transport and only one briefcase (which 
TPSYS was able to solve). Unfortunately, performance of TPSYS degrades in 
problems with limited resources. 

The presented work constitutes a first step towards an integrated system 
for planning and scheduling. Such a system will be able to manage temporal 
constraints on actions and to reason on shared resource utilization. Additionally, 
the system will apply several optimization criteria to obtain the plan of minimal 
duration or the plan of minimal cost. 

Another objective of our current research is the inclusion of resource ab- 
straction as proposed by Srivastava in [14]. Under this proposal, the TG will 
be generated assuming abstract resources, which are always available, avoiding 
instantiation over particular resources and consequently reducing the size of the 
TG. A special module for reasoning about resources will be included into the 
PE stage to select the specific resource for each action to be planned. 
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Abstract. In this paper we present SimPlanner, an integrated planning 
and execution-monitoring system. SimPlanner allows the user to monitor 
the execution of a plan, interrupt this monitoring process to introduce 
new information from the world and repair the plan to get it adapted to 
the new situation. 



1 Introduction 

Research on AI planning usually works under the asumption that the world is 
accessible, static and deterministic. However, in dynamic domains things do not 
always proceed as planned. Interleaving planning and executing brings many 
benefits as to be able to start the plan execution before it is completed or to in- 
corporate information from the external world into the planner [8] . Recent works 
on this field analyse the combination of an execution system with techniques as 
plan synthesis or anticipated planning [3]. Other works on reactive planning 
[10] are more concerned with the design of planning arquitectures rather than 
exploiting the capabilities of the replanning process [9] . 

SimPlanner is a planning simulator that allows the user to monitor the exe- 
cution of a plan and introduce/delete information at any time during execution 
to emulate an external event. It is a domain-independent, synchronous replanner 
that, like other planning systems [10], avoids generating a complete plan each 
time by retaining as much of the original plan as possible. SimPlanner has been 
specially designed for replanning in STRIPS-like domains and has been success- 
fully applied to a great variety of different domains. Additionally, the replanner 
obtains the optimal solution with respect to the number of actions for most of 
the test cases. 

2 Monitoring the Execution of a Plan 

Replanning is introduced during plan execution when an unexpected event oc- 
curs in the world. One problem with unexpected effects is deciding how they 
interact with the effects of the action that was currently being executed. Our 
solution is to assume the action took place as expected and simply to insert the 
unexpected effects after the execution of the action. 
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When an event is produced it is necessary to verify the overall plan is still 
executable. Many replanning systems only perform a precondition checking to 
verify whether next action is executable. This option is less costly and much 
easier to implement in many real applications where the sensory system does 
not capture all predicates that have changed in the problem. However, this ap- 
parently efficient approach may turn out to be inefficient in the long term as 
many unnecessary actions might be introduced due to changes in the plan are 
not foreseen enough time in advance. 




Fig. 1. SimPlanner main interface window 



Figure 1 shows the graphical interface to monitor a plan execution. The prob- 
lem corresponds to one of the instances in the robot domain which SimPlanner 
has been tested on. In the left upper part of the screen it is shown the literals 
of the current state of the execution. On the right side of the screen a graph 
representing the plan under execution is displayed. The circles stand for the ac- 
tions in the plan. Those actions ready to be executed at the next time step are 
double-circled. 

3 Replanning During Execution 

The replanning algorithm starts from the current state in the plan execution, 
when the unexpected event has been input. The objective is to discover which 
state should be reached next in the problem so as to retain as much of the 
original plan as is reasonable without compromising the optimal solution. The 
following two sections explain the algorithm in detail and provide an example 
to clarify SimPlanner behaviour. 
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3.1 SimPlanner Algorithm 

Initially, the user is monitoring the execution of a plan P = (qq an) 

when an expected event is introduced in the system. The remaining plan to be 
executed is defined as 7T = (oq/ ^ ^ a„) /\/ai G 77 — > Ui G P and Oi has 

not been executed yet. a^f represents the new current situation (an initial action 
with effects and no preconditions). 



Algorithm SimPlanner (77) — > plan 77' 



1. Build a Problem Graph (PG) alternating literal levels and action 
levels (Lq, Ai, 7i, A 2 , ...) , where: 

Aj={aj : Pre(aj) G Lj-i f\ aj ^ Ai ,i < j} 

Lj = Lj-i U Add(a) Va G Aj 

2. Compute the necessary state, S{a), for each a G 77 



5'(a0 



{I : I € Pre(o„)}, i = n 

Ua^eiMext(ap S{aj) U(Pre(ai) - Add(aO) - Ua,eParai(ap Add(afc), otherwise 



where : 

Next(ai) = {oj G II A at —> aj} 

Succ(ai) = Uj cij G n/ai ^ ^ aj 

Paral(ai) = {uj G 77 A flj ^ Succ(ai) A Oi ^ Succ(aj)} 



3. Compute the set of possible reachable states RS = {S(ci), . . . , S'(Cn)} , 

where : 

a = {m, 02 , ■ ■ • }/Vj3 G Path(n) ^ 3! Ofc G p A Ofc G Ci 
Path(n) the set of all possible paths between ao> and On, and S{ci) = 

4. Select the optimal reachable state Sopt = argmin{f{x) : x G RS} , where 
fix) = g[x) + h{x). 

5. Construction of the final plan 77' = {aol ^ ^ Sopt) 0 {Sopt ^ ^ a-n) ■ 



Fig. 2. SimPlanner algorithm 



Problem Graph (PG). The first step of the algorithm is to build the PG, 
a graph inspired in a Graphplan-like expansion [2]. The PG may partially or 
totally encode the planning problem. The PG is a relaxed graph (delete effects 
are not considered) which alternates literal levels containing literal nodes and 
action levels containing action nodes. 

The first level in the PG is the literal-level Lq and it is formed by all literals 
in the initial situation ao' . The PG creation terminates when a literal level 
containing all of the literals from the goal situation is reached in the graph or 
when no new actions can be applied. This type of relaxed graph is commonly 
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used in many heuristic search planners [4] [5] as it allows to easily extract an 
approximate plan. 

Necessary states. Second step is to compute the necessary state to execute 
each action in 77. A necessary state for an action is the set of literals required 
to execute at and all its sucessors (Succ(ai)). In order to compute the necessary 
states, literals are propagated from the goal a„ to the corresponding action by 
means of the recursive formula shown in the algorithm. 

Set of possible reachable states. This set comprises the necessary states 
to execute a set of parallel actions {oj} and all their successors Succjoi}. A state 
will be definitely reachable if its literals make up a feasible situation in the new 
problem. 

In a totally sequential plan 77, the possible reachable states will coincide with 
the necessary states to execute each action in 77. However, when 77 comprises 
parallel actions it is necessary to compute the combinations of parallel actions, 
Ci, such that each element in Ci belongs to a different path from the current state 
to the goal state. In other words, Ci is a set of actions that can all be executed 
at the same time step. Consequently, the necessary state to execute actions in 
a denotes a state possibly reachable from ao' ■ 

Optimal state. In order to select the optimal reachable state, we define 
a heuristic function f{x) = g{x) + h{x) associated to the cost of a minimal 
plan from the current situation a^' to the goal state a„ over all paths that are 
constrained to go through node x. g{x) is the cost of the plan from x to a„ and 
is calculated straightforward from the original plan. h(x) is the estimated length 
of an approximate plan P' from ao' to x. 



Algorithm Approximated plan (ao',x) plam P' 

1. Build a ficticious action an with preconds and no effects associated 
to state X 

2. P' = ao' ^ On' 

2. L = X — Add(ao') . 

3 . while 7/ 7 ^ 0 

3 . 1 select I G L 

3.2 select best a for I 

3.3 insert a in P' 

3.4 update L 

4. return P' 



Fig. 3. Outline of the algorithm to build an approximate plan 



Step 3.1. Firstly, we form a set UP C L with the unsolved preconditions of 
the nearest action a to Oq'. If |C/P| > 1 then literals which appear later in the 
PG are removed from UP. If again \UP\ > I we count the number of ways for 
solving each I G UP ({a G Ai : I G Lj A I G Add(a) A i <= j}), and select the 
literal with the lowest number of actions. 
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Step 3.2. The best action for I G Pre(a) will be the action aj which mini- 
mizes the number of flaws (preconditions not yet solved or preconditions of other 
actions which are deleted by afl. To compute the number of flaws we have to 
check all possible positions of aj in the plan provided that aj < a. 

Step 3.3. The new action aj is inserted in the position obtained in the 
previous step. This position may be sequential - between two actions- or parallel 
to one or more actions. In this latter case, it must be possible to execute all 
actions in parallel, i.e. none of the actions will require and Add effect of another 
action or delete any of its preconditions. When this is not possible, actions must 
be executed sequentially. 

The length of the returned plan P' will be the value of the heuristic function 
h{x). Some of the properties of the heuristic function are: 

— if X is an inconsistent or non-reachable state (all literals in x cannot be true 
at the same time), h(x) returns oo, 

— if X is reachable from ag' so will be the states following x. This helpful 
information reduces vastly the cost of computing h{x), 

— although h{x) is a non-admissible heuristic, it returns the optimal state for 
most of the test cases in empirical evaluations. 

3.2 An Application Example 

We will use an example of the Hanoi domain to illustrate the behaviour of Sim- 
Planner. The problem consists of four disks (huge H, big B, medium M and 
small S) and four pegs PI, P2, P3 and P4. Initially, the four disks are on 
PI (Figure 4). The goal to be reached is shown in the final state of Figure 
4. The unexpected situation occurs after executing the first action in the plan 
move S M P2 (move disk S from disk M to P2). At this time, a new smaller disk 
(tiny T) appears on P4. Figure 5 shows plan U which is no longer executable 
because P4 is not empty any more. 




Initial State 



1 =^ 



P2 




P3 


P4 






B 


S 


U 






1 II " II ■! 



Final state {gsals) 



Fig. 4. Initial and final situation in the problem 



Literals, denoted by numbers, are represented in Table 1. In Figure 5, literals 
above each node stand for the action preconditions and literals below each node 
represent the add and delete effects of the action. 

The necessary states for each action in II are shown in table 2. These are cal- 
culated by applying the formula in step 2 of SimPlanner algorithm. Then we com- 
pute the set of possible reachable states RS = {S'(ai), S{a 2 , as), S{a 4 ,), S{a 5 ), S{an)}. 
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Table 1. Literals in the Hanoi problem 



3 


on S P2 


5 


clear P3 


6 


on H PI 


8 


on B H 


10 


on M B 


12 


clear M 


13 


on T P4 


15 


clear S 


16 


clear T 


19 


clear P2 


20 


on M P3 


21 


clear B 


23 


clear P4 


24 


on S M 


30 


clear H 


35 


on B P4 


39 


clear PI 


40 


on H P2 



3,12,15 




Fig. 5. Remaining plan to be executed (i7) 



Notice S{a 2 ) and S{a^) form a single state because both actions can be executed 
at the same time step. The result of f{x) for each possible reachable state is 
shown in table 3. h{RSl) returns oo because it is impossible to satisfy literals 
12, 15, 5 and 23 due to the extra disk T. The same applies to RS2. However, 
RSi is a reachable state because disk T can be located on top of disk S and 
clear S is not a condition required in RSi. The selected optimal state is S{ai) 
as this is the state that minimizes the value of f{x). In case of equal values for 
f{x) we will select that state with minimum value of h{x). 



Table 2. Necessary states for the hanoi example 



Action 


Description 


Necessary state 


dn 


final state 


8, 20, 24, 40 


as 


move B P4 H 


20, 21, 24, 30, 35, 40 




move H PI P2 


6, 19, 20, 21, 24, 30, 35 


as 


move B H P4 


6, 8, 20, 21, 23 


02 


move S P2 M 


3, 6, 12, 15, 20, 21 


ai 


move M B P3 


3, 5, 6, 8, 10, 12, 15, 23 



Finally, a plan from oq' to 04 is built (tO: current state, tl: move M B P3, 
t2: move S P2 M, t3: move T P4 S, t4: move B H P4) and concatenated with the 
rest of the old plan from action 04 (t5: move H PI P2, t6: move B P4 H). For 
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Table 3. Reachability table for the hanoi problem 



X 


Pos. reachable 
state 


h{x) 


g{x) 


fix) 


RSI 


S'(al) 


oo 


5 


OO 


RS2 


S{a2) U S{a3) 


oo 


3 


oo 


RS3 


S{a4) 


4 


2 


6 


RS4 


S{a5) 


5 


1 


6 


RS5 


S{a„) 


6 


0 


6 



our purposes, we have used the planner 4SP [6] as SimPlanner uses most of the 
data structures which are managed by this planner. The plan obtained when 
computing h{x) is used as a lower bound in 4SP. For most of the test cases, the 
plan returned by 4SP was the same as the obtained in the computation of h{x). 
In the hanoi problem, h{x) returns the optimal plan in four execution steps. 

4 Experimental Results 

SimPlanner has been tested on several domains with different types of input 
information about external changes. The tested domains are hanoi, monkey, 
blocksworld, logistics and mobile robots navigation. 

Figure 6 shows the comparative times for several problems between generat- 
ing a complete plan from scratch or repairing only the affected parts of the plan 
(replanning process). In all except one of the test cases, the obtained plan was 
the optimal one. Temporal cost for replanning includes the cost of computing 
the reachable state plus time for generating the plan. As we can see in figure 6, 
replanning runtimes are much better when input changes are not very signifi- 
cant. In problems P5 and P7 respectively, the new current state forces to create 
a complete plan, so the cost of replanning is slightly higher. 

We have also made the same tests with planner STAN2000 [7][1]. The time 
difference between planning and replanning is about the same proportion as the 
ones shown in Figure 6. 

5 Conclusions 

SimPlanner is a planning and execution system which allows the user to moni- 
tor the execution of a plan, interrupt the monitoring to input new information 
and repair the plan under execution according to unexpected event. SimPlanner 
performs an execution monitoring rather than simply testing the next action to 
execute. In this way, SimPlanner anticipates forthcoming situations and adjusts 
the plan in accordance. 

The key point in SimPlanner is the replanning module. SimPlanner uses 
a graph-based planning approach supported by heuristic search techniques to 
efficiently replan in a dynamic world. 
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Blocksworld - 12 blocks 



Hanoi - 4 disks 





P1 P2 P3 P4 P5 P6 P7 



slight changes 



major changes slight changes 



major changes 



Fig. 6. Comparative results: generating a complete plan versus replanning 



Currently, we are working on the integration of the planning algorithm and 
SimPlanner. The main objective is to be able to guarantee the optimality of 
the heuristic evaluation and improve the efficiency of the overall process. Ad- 
ditionally, we intend to extend SimPlanner to deal with time and consumable 
resources (as the battery in robot mobile environments). 
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Abstract. This paper presents a hybrid hierarchical knowledge-based system 
that is currently under development. This system is used for basic planning pur- 
poses at present. The important component of the system is a set of plan opera- 
tor structures. Indexing and packaging hierarchies together play a key role in 
the construction of a structure. An abstraction hierarchy is used to organize the 
domain knowledge of the system. The system utilizes some heuristic knowledge 
of the domain to avoid brittleness. The system has been implemented in the 
domain of cooking vegetables in Indian style. A restricted natural language in- 
terface is incorporated for the convenient specification of user inputs. 



1 Introduction 

For a given start and goal states, a plan is a sequence of operators (or states) that con- 
nects the start state to a goal state. The process of finding this sequence is the task of 
planning. Classical planning approaches are based on search [15], [16], [20], [23]. In 
contrast, knowledge-based planning approaches depend on past experience for solving 
new problems. In this paper we call a system as knowledge-based or memory-based if 
it is based on Memory Organization Packets (MOPs) [14]. Either one or more of 
abstraction (i.e., inheritance), indexing or packaging hierarchies [14] are used in 
implementing a MOP based system. Most of the memory-based planners are based on 
indexing hierarchies and these systems are referred to as case-based planning systems 
[2], [3], [4], [5], [7], [10], [21]. Some memory-based systems are based on packaging 
hierarchies [5], [10]. 

Hierarchical problem solving [4], [5], [10] is an approach using intermediate states, in 
order to reduce search. Backtracking can be avoided if the intermediate states are 
generated from memory instead of a trial and error process. In this paper we explore 
the design of a hierarchical planning system that is based on indexing and packaging 
hierarchies for organizing planning knowledge. This design is, in some sense, a fruit- 
ful combination of the approaches presented in [4], [5], [10]. The plans are tailored to 
specific requirements, with the aid of an abstraction hierarchy that is used to organize 
the domain knowledge. The system also utilizes some heuristic knowledge of the 
domain to avoid brittleness. A restricted natural language interface (RNLI) is incor- 
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porated for the convenient specification of user inputs. Using Common Lisp, the sys- 
tem is implemented in the domain of cooking vegetables in Southern Indian Style. 
Figure 1 specifies the schematic diagram of the system. The following sections contain 
some details regarding the system. 




Output 



Figure 1. The schematic diagram of the system 



2 Restricted Natural Language Interface 



A restricted natural language interface is incorporated for the convenient specification 
of user inputs. The interface is based on a parser and a lexicon [13], as shown in fig- 
ure 2. The parser checks each and every word of a specified input and converts it into 
a set of pre-defined predicates. For example, if the input is: “a fried dish with potato 
and brinjal” then the output is: “((style fry)(with-vegetable potato)(with-vegetable 
brinjal))”. The output is supplied to the memory of plan operators. Now we discuss 
the memory organization of the system in detail. 



Restricted 

natural 

language 

sentence 



Restricted natural language interface 




Figure 2. Restricted natural language interface 



Memory of plan 
operators 



3 Organization of the Memory 

The domain knowledge of the system is organized using an abstraction hierarchy, as 
explained in section 3.1. The planning knowledge is organized based on the fact that 
in structured domains there exists distinct and identifiable classes of plans. The plans 
belong to the same class are more similar and there are fundamental differences be- 
tween the plans belong to different categories. In addition to this fact, the system is 
based on the observation that neither case-based [I], [2], [3], [5], [7], [8], [9], [11], 
[12], [14], [18], [21], [22] nor script (or, common events)-based [5], [10], [17] ap- 
proach alone is sufficient in organizing the planning knowledge in some complex 
structured domains such as ours. In fact, our preliminary results show that a combina- 
tion of these two approaches is effective in organizing the knowledge. At present, we 
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are evaluating the effectiveness in terms of memory requirements, speed of plan gen- 
eration, and the flexibility of the system. Planning knowledge organization is ex- 
plained in section 3.2. To avoid brittleness [19], some heuristic knowledge of the 
domain is incorporated into the system. The heuristic knowledge organization is ex- 
plained in section 3.3. 



3.1 Memory of Domain Objects 

The memory of domain objects organizes the domain knowledge of the system, in our 
case the cooking domain. It is organized using an abstraction hierarchy. An object, 
which is either a property or a domain object, is represented as a frame [13]. Planning 
information such as pieces size and cooking time of the ingredients is associated with 
the objects. This information is retrieved and used by the memory of plan operators 
while generating a plan. More details of this memory are available in [4], [5], [10]. 



3.2 Memory of Plan Operators 

The system is based on the fact that there are different classes of plans, with the plans 
belong to the same class are more similar and the plans belong to different classes are 
basically different. 



In structured domains, some times, the experience acquired in the form of cases is get 
assimilated into more general structures (i.e., generalized cases). This is in accordance 
with Schank’s notion [17] of how people are acquiring scripts. But some times, the 
knowledge is not get assimilated into general memory structures and remain as inde- 
pendent cases, as we see in case-based reasoning systems. These two different types of 
knowledge organizations may co-exist within the same system, to handle the planning 
knowledge. The demonstration of this observation is the primary content of this paper. 
We also observed that, if there is more similarities between certain episodes (an epi- 
sode is a case or a sub-case) then the episodes get assimilated into general structures, 
otherwise they remain as independent cases. In our system, packaging links are used 
for organizing the general structures and indexing links are used for organizing inde- 
pendent cases. 



structure for fry 



common node 




Figure 3. Connection of plan structures to a common root node 



In cooking domain, a cooking style represents a class of plans. A style is represented 
in the form of a tree. The root nodes of different styles are connected to a common 
node. These links are indexed by the name of the class (i.e., cooking style). Figure 3 
contains two structures connected to a common node. In this diagram, the thin lines 
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represent indexing links and the thick triangles represent plan structures (i.e., classes 
of plans). 

A highest-level plan operator identifies a plan structure. The operator in turn has ei- 
ther indexing links or packaging links (but not both) to the next-level plan(s) [4], [5], 
[10]. If the next-level plans are so similar then they are organized using indexing links 
else they are organized using packaging links. Again, each of the operators in the next- 
level plan(s) has either indexing or packaging links and so on. Now we discuss the 
organization of the next-level plan of an operator. Here there are two cases: 

First Case: If the operator has indexing links then the next-level plan is in the form 
of a set of plans that are created in terms of the operators at the next hierarchical level. 
For example, the operator /ry has indexing links to the following two plans: 

(i). Basic-preparation, prepare-pieces, preparation-for-fry, perform-fry (corresponds 
to the ingredient lady’s finger) (ii). Basic-preparation, boiling, preparation-for-fry, 
perform-fry (corresponds to the ingredient peas). 

These plans are the next-level plans for the operator /ry. 

fry 




Utensils-for-fry Ingredients-for-fry 



Figure 4. Internal organization of a structure 

Second Case: If the operator has packaging links then the next-level plan is in the 
form of a sequence of operators that will be present in any of the expansions of that 
operator. For example, the operator preparation-for-fry, presented in the previous 
example, has packaging links to the following operators: utensils-for-fry, ingredients- 
for-fry. These two operators (in the specified order) together constitute the next-level 
plan of the operator preparation-for-fry. 

Common operators are shared between different plans. Figure 4 depicts a portion of 
the plan structure corresponding to the cooking style fry. In this figure, dotted lines 
represent indexing links and solid arcs represent packaging links. In addition, common 
operators are underlined and the indexes are not shown in the figure. 

If an operator has indexing links then the operator may have more than one next-level 
plans. One of the next-level plans will be adopted while generating a plan. The modi- 
fied plan is stored back into the memory. In the case of packaging links, the operator 
has exactly one next-level plan and this plan is always being adopted while generating 
a plan. The modified plan won’t be stored back in this case. 
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3.3 Memory of Heuristics 

Heuristic knowledge is used to avoid the brittleness [19] of the system. To translate 
user requirements into plan specifications, some common heuristics that are valid 
across the spectrum of plans are incorporated into the system. For example, in cooking 
domain, a fried dish with little amount of oil needs to have its ingredients chopped in 
smaller pieces. This knowledge is organized in the form of if-then rules. The memory 
of heuristics is forward chaining rule-based system [6]. The /act part contains the 
properties of the domain objects and the input supplied by RNLI. The rule part con- 
tains plan (i.e., recipe) independent if-then rules. The rule-based system generates 
inferences based on the input and supplies it to the memory of plan operators for plan 
generation. Now we see the planning algorithm. 



4 Planner 

The RNLI accepts user input and converts it into a set of pre-defined predicates. The 
output is supplied to the memory of heuristics, where more predicates may be added to 
the set by the rule-based system. The resultant set is supplied to the memory of plan 
operators. Based on the class specified in the input, a suitable plan structure is selected 
for expansion. A plan is generated by expanding the structure in a hierarchical fashion, 
with the aid of the memory of domain objects. The actual algorithm is presented be- 
low. 

1. /* RECEIVE INPUT */ Accept user specifications, supplied using the RNLI. 
Converts it into a set of predicates. 

2. /* REFINE INPUT */ Refine the input using the rule-based system in the memory 
of heuristics and supply the resultant set to the memory of plan operators. 

3. /* SELECT STRUCTURE */ Select the operator that represents the root node of 
the required structure. 

4. /* INITIALIZE */ For each level i in the hierarchy of the structure, create a list 
plan-list-i to store the operator identifiers at that level. Initialize each list to NULL. 
Add root node to plan-list-0. Set i *- 0. Retrieve planning information from the 
memory of domain objects 

5. /* EXPAND, MODIFY AND POSSIBLE UPDATE */ For each of the operators 
(from left to right) in plan-list-i, do the following in the specified order. 

(a). If it is a high-level operator then 

(i). Generate the next-level plan using the indexing or packaging links, (ii). 
Modify the plan using the appropriate modification rules associated with the 
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operator, (iii). If the indexing links are used for plan generation then store the 
modified plan. 

(b) . If the operator is a ground-level one then 

(i). If the operator has a variable and is not instantiated then compute the 
value of the variable. Instantiate the variable with the value, (ii). If either the 
template doesn’t have a variable or all the variables in the template are al- 
ready instantiated then leave the operator unchanged. 

(c) . Append the modified next-level plan or the ground-level operator to 

plan-list-(i+l ). 

6. /* DESCEND */ If any of the elements in plan-list-(i+l) is not an instantiated 
ground-level one then set i ^ (i+1) and go to step 5. Else output the ground-level 
plan. 



5 Comparison and Concluding Remarks 

In this paper we have presented an approach for organizing knowledge using hybrid 
hierarchical structures. We also demonstrated how this knowledge organization is 
useful for basic planning purposes. This system has used MOP-based hierarchies for 
knowledge organization. In addition, we have incorporated a rule-based expert system 
for avoiding brittleness. The system is currently under development. As a result, the 
performance results such as memory requirements, plan generation speed, and flexibil- 
ity are not presented in this paper. 

There are some similarities between this system and ABSTRIPS [15]: both are hierar- 
chical. Unlike ABSTRIPS, in our system, there are no hierarchies among the pre- 
conditions of the operators. Instead, the hierarchies are among the operators itself. The 
system is fundamentally different from other classical systems [16], [20], [23], which 
generate plans from scratch. In our system there is already a partial plan, in the form 
of packaging and indexing links. As a result, generating a plan in our system is much 
more easier when compared to classical systems. 

There is some similarity between our system and MEDIATOR [18] that uses pieces of 
cases in creating a case. MEDIATOR has to select a case in order to access the rele- 
vant part. Where as in our system, a complete case is never accessed in one shot. 

Even though both are hierarchical, the difference between our system and CELIA [12] 
is that in CELIA, a link represents a relationship between goals in a case. In our sys- 
tem, a link connects an operator and its next-level plan(s). 




Hybrid Hierarchical Knowledge Organization for Planning 407 



There is a similarity between our system and the case-based planning system CHEF 
[7]: both operate in cooking domain. But there are some fundamental differences 
between CHEF and our system. CHEF is organized completely based on indexing 
hierarchies, where as in our system, a combination of indexing and packaging hierar- 
chies are used for memory organization. CHEF can repair its failed plans and there is 
no such mechanism yet in our system. Finally, CHEF does not have the memory of 
domain objects to guide its planning process. 

PRIAR [9] uses the validation structure of a plan, which represents the dependencies 
among the plan steps, for retrieval and reuse purposes. Our system uses a structure for 
the purpose of plan generation. In addition, unlike PRIAR, our system does not have 
backtracking facility for undoing a wrong step. In stead, our system utilizes the mem- 
ory of domain objects to guide the expansion of a structure. 

The system is different from PRODIGY/ANALOGY [21], which is based on genera- 
tive planning and case-based reasoning. In this system, the retrieved solution is not 
used for adaptation. Instead it is used to guide a classical nonlinear planner, in gener- 
ating the solution. In our system, the retrieved case is used for adaptation. The modifi- 
cation knowledge and the memory of domain objects are used for this purpose. 

There are some similarities between the current system and the systems presented in 
[4], [5], [10]. All these systems are implemented in the cooking domain. They are all 
based on the concept that, in structured domains, there are different classes of plans. 
But this system is different because the system presented in [5], [10] is based on pack- 
aging links to organize plans. Similarly, the system presented in [4] is based on index- 
ing links to organize the planning knowledge. But the present system is based on both 
indexing and packaging links in organizing a class of plans. As a result, this system is 
(hoped to be) more flexible in handling different types of planning knowledge. The 
previous systems force us to use a particular structure, either indexing or packaging. 
In addition, the systems presented in [4], [10] do not have any mechanism to avoid 
brittleness. 

There is a similarity between PARIS [2] and our system: both are case-based and both 
organize cases at different levels of abstraction. But there are differences too: in order 
to solve a new problem, PARIS retrieves an abstract case, whose abstract problem 
description matches with the current problem description. A generative planner that 
performs a forward directed state space search modifies the abstract case. In our sys- 
tem, a suitable structure is adopted and hierarchically modified, with the aid of the 
modification rules that are associated with the operators. 

Now we are in the process of incorporating a learning component into the system, so 
that the system can learn the modification rules, from a sufficient set of cases. We are 
making our system more efficient by making it learn from previous failures. In addi- 
tion, we are in the process of developing a mechanism for repairing failed plans. 
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Abstract. In the last few years, the field of planning in AI has ex- 
perimented a great advance. Nowadays, one can use planners that solve 
complex problems in a few seconds. However, building good quality plans 
has not been a main issue. In this paper, we introduce a planning system 
whose aim is obtaining the optimal solution w.r.t. the number of actions 
and maintaining as maximum number of parallel actions as possible. 



1 Introduction 

The field of planning in AI has experimented great advances over the last few 
years. However, most efforts have been devoted to the development of fast plan- 
ners [8] [1], capable of solving a wide range of problems. This fact has focused 
researchers’ attention on the development of heuristics to accelerate the plan- 
ning process, considering less important the optimization of the obtained plans. 
However, in real application environments, apart from finding efficient planning 
processes, it is important to obtain (close to) optimal plans in order to reduce 
their execution cost. 

In general terms, current planning systems can be classified into: 

— Sequential planners, like FF [5], FISP [3] and Seristar [4]. The plan is a 
totally ordered sequence of actions. The optimal plan is the shortest one. 
Seristar guarantees this optimal solution, unlike FF, which is much faster. 

— Parallel planners, like STAN [7] and Parastar [4]. The plan is a totally 
ordered sequence of time steps in which a set of actions can be executed. 
The optimal solution consists of the plan with the minimum number of time 
steps. Both STAN and Parastar are able to find this solution. 

In this paper, we present STeLLa^, a planner that, assuming a constant cost 
for each action, builds the minimal cost plan and then, reduces the number of 
time steps by executing some actions in parallel. The main novel issue in STeLLa 
algorithm, which implements a forward chaining search, is the use of landmarks 
graphs. 

* This work has been partially supported by the project n. 20010017 - Navigation for 
Autonomous Mobile Robots of the Universidad Politecnica de Valencia 
^ STeLLa is an acronym of the words Sequential, TimE Layer and LAndmarks. 
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The concept of landmarks graph (LG) is introduced in [9], where a landmark 
is defined as a literal that has to be achieved in every solution plan. The process 
extracts a set of landmarks that are ordered under the concept of “reasonable 
order” , obtaining a LG. In [10] the idea of “reasonable order” is extended. Unlike 
these previous approaches, STeLLa uses the concept of LG to determine at each 
point the next set of literals to satisfy and the best actions to execute in order 
to achieve this set of literals. 

This paper is organized as follows. In section 2 some basic concepts about 
LGs are given. In section 3 the STeLLa algorithm is described and an example 
is shown in section 4. Section 5 gives the results obtained so far and section 6 
concludes by summarizing the strong and weak points of this algorithm. 

2 Foundations 

This section gives some basic concepts to understand the structure of a LG. 

The process to build a LG consists of two steps [10]: (1) extracting the set 
of landmarks and (2) ordering the obtained set of landmarks. 

Definition 1. Given a planning task V = {0,T,Q), a fact li is a landmark 
in V iff li is true at some point in all solution plans, i.e., iff for all P = 
{oi, ... ,On),Q C Resultll, P) : li G Resultll, {o\, ... ,Oi)) for some 0 < i < n. 
The side-effects of a landmark h are defined as: side-effects(li) := {Add(o) — 
{h} \ o G OJi G Add{o)} 

The extraction process is straightforward. First, a relaxed planning graph 
(RPG) is built. Then, a backward search is performed over this RPG, selecting 
as landmarks all the literals that belong to the intersection of the preconditions of 
the actions achieving a top level goal or a landmark. The remaining preconditions 
are grouped into a disjunctive set. 

Definition 2. Letli,lj be two landmarks, h andlj are consistent if 3 a possible 
state S/li G S Alj G S. 

We use the TIMinconsistent function [2], which returns whether two literals 
are consistent or not. Once the set of landmarks has been extracted, they are 
ordered according to the following orders, which in turn define an LG: 

Definition 3. A natnral order is established between two landmarks li,lj (h <„ 
Ij) when in every solution plan is necessary to solve li to achieve Ij, that is, li 
is a precondition of all the actions that satisfy Ij. 

Definition 4. We establish a weakly reasonable order in the following cases: 

— landmarks li and Ij that are naturally ordered before the same node Ik can be 
ordered li <wr Ij, if 3 landmark x : x <n h A TIMinconsistent(a;, Ij) = TRUE 

— two landmarks h and Ij can be ordered h <wr Ij if there exists some other 
landmark x, and x and Ij are ordered before the same node; and there is an 
ordered sequence of <„ orders that post h before x. In this situation, li and Ij 
are weakly ordered if 3 landmark y ■ y <n ATIMinconsistent(y, /j) = TRUE 
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Algorithm STeLLa (X,Q) — )■ plan V organized in time steps 



Build LG{T, Q) 

C = I; t ime = 0 
While g 

1 . Compute fringe T 

2. Select a consistent subset CS of landmarks from T 

3. Compute the set of actions A that solve CS 

4. Select the actions from A that have to be executed 

5. If ^ = 0, build LG{C,g) 

Else Execute A, obtaining the new C 
P time = A; f ime = t ime + 1 

Fig. 1. STeLLa algorithm 



— a pair of landmarks li and Ij is ordered li <wr Ij if 3 landmark x : li <„ 
X A Ij <wr X A T\M\ncons\stent{side-effects{lj),li) = TRUE 

Definition 5. A LG is a graph (N,E) where: 

— li G N if h is a landmark or a disjunetive seif . 

li: ij ^ (.ii: ij') ^ ^ if ii '^n ij V li ^wr ij ■ 

— Moreover, the following information extraeted from side-effeets is added: if 
li is a side-effeet oflj and vieeversa, a eonjunetive set \li,lj] is built, so that 
[li,lj] is added to N andVlk/ (IkJi) ^ EV (hjj) £ E ^ E = EU (1^, [kjj]) 

Because the process for extracting landmarks and the computation of orders 
are approximated computations, the information in the LG can be incomplete. 

Definition 6. Let Oi,aj be two aetions: 

— Oi eauses a NFl eonfliet over Oj if 3d € Del(a,)/ d € Pre(aj) A a, < Oj 

— Oi and Oj are in NF2 eonfliet if 3d, € Del(a,)/ d, € Pre(aj) A 3dj € 
Del(aj)/ dj € Pre(a,) 

— Oi eauses a F eonfliet over Oj if 3d € Del(a,)/ d € Pre(aj) V d € Add(aj) 

3 Algorithm 

The algorithm implemented in STeLLa is shown in Figure 1. Each point in this 
algorithm is explained in the following sections. 

3.1 Computation of 

In each iteration, STeLLa computes the set of landmarks to be solved at the 
current time step, called fringe and denoted by Eume- Let LG{N,E) be the 

^ For the sake of simplicity, we call landmarks all the elements in N throughout the 
rest of the paper. 
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current LG and li G N, li G Tume if V/j € N/(lj,li) € E,lj € Tume' ,tirne' < 
time. From now on, we will identify tFume as iF. 

However, due to the incompleteness in the LG, it might be the case that not 
all the landmarks in T can be achieved right now. Therefore, it is necessary to 
check whether a literal has to be postponed. In first place, for disjunctive sets, 
only natural orders have been computed during the process for building a LG. 
Therefore, a disjunctive set k belonging to a LG{N,E) is added to the fringe 
T if V/j e e E,\/lk e N/(lk,lj) G E ^ £ Etime' <= time. 

On the other hand, a landmark might be delayed because its producer actions 
remove a literal which is required later in the planning process. Let LG{N, E) 
be a LG and / be a landmark in its corresponding E'. 

1. P = £ E A TIMinconsistent(/j, /) = TRUE} 

2. if 3p e Plout — degree{p) > 1 A 3/ € N/l ^ E A (p,l) £ E A 
TIMinconsistent(Lp) = FALSE E = E — f 

3.2 Computation and selection of consistent subsets 

Due to incompleteness in the LG, it may be the case that not all the landmarks 
in E can be in the same state, that is, they are inconsistent. Therefore, the first 
step in order to ensure that the literals in E can be achieved at the same time 
from the current state is building consistent subsets from E. 

Definition 7. Let LG(N, E) and E be the eurrent LG and fringe. A CSi is a set 
of literals where Vlj £ CSi,lj £ EA ^Ik £ C<Sj/TIMinconsistent(/j, /i,) = TRUE 

Once E has been divided into consistent sets CSi, we select one of them, 
called CS, which will determine the subset of landmarks to be satisfied at this 
iteration. Therefore, we evaluate each CSi under the following criteria and select 
the one with the best value: 

1. We compute an approximation to the new state C in the following way: 
C = CSi U {fj £ Cf'ifi £ CiSj, TIMinconsistent(/j, /j) = FALSE) 

2. Next step is to build the LG for C and evaluate it by using the same criteria 
that will be applied in the following section for the action selection. 



3.3 Computation and selection of actions 

For each literal k in CS, the set of applicable actions Ai is computed according 
to the type of Ip. 

1. Literal: Ai = {o € O/h £ Add(o)} 

2. Conjunctive set: Ai = {o £ OJU C Add(o)} 

3. Disjunctive set: Ai = £ Of fj £ Add(o)} 

We define .4 as a set containing the actions to be executed in the current 
iteration. In order to build this set, only one action from each Ai is selected 
by using this heuristic function h{aij) = /(applicable, NFl, NF2, LG-correct, 
LG-delete), where: 
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1. applicable(ay) = 1 if Oy can be executed in the current state and its execution 
does not generate a previously reached state; 0, otherwise. 

2. NFl(ay) = 1 if ttij produces a NFl conflict with all the landmarks immedi- 
ately following JF; 0, otherwise. 

3. NF2(ay) = 1 if Oy generates a NF2 conflict with every action in another set 
Ak', 0, otherwise. 

4. Build the LG corresponding to the state resulting from the application of 
Oy in the current state. LG-correct(ay ) = 1 if none of the actions that would 
solve the set T of this LG would generate a NFl or NF2 conflict; 0, otherwise. 

5. LG-delete(ay ) is the number of landmarks that would be necessary to reachieve 
after applying Oy , i.e., the number of delete effects of Oy which are required 
again as preconditions in the LG. 

However, some conflicts may arise among the actions in A and then some 
actions must be postponed. For each pair of actions ai,aj € A, if a, and Uj 
produce a NF2 conflict, two new sets are built by removing the landmarks that 
tti and ttj achieve respectively, and one of them is selected according to the 
selection criteria for consistent sets; otherwise, if a, produces a F conflict over 
Uj, Ui is ruled out from A. 



4 Application example 

In this section the behaviour of the STeLLa algorithm is shown through an ex- 
ample whose goal is moving two objects: object 01 is in a secure box and object 
02 is grabbed by one robot at 15 and there is another robot at 11. The plan to 
solve this problem entails to take a key at 14 and push an elevator from 12 to 13 
where the secure box is, then go up with the elevator to reach the secure box 
and open it with the key for grabbing 01; carry 02 to 13 and drop it there. Not 
all locations are connected among themselves: it is not possible to move directly 
from /3 to /4 or from 11 to 13, and from /5 it is only possible to reach 13. 

In the LG for the initial state (Fig. 2), T = {at-robot R1 L4, at-rohot R1 
L2, at-robot R2 L3}. Not all the literals in T are consistent, so two subsets are 
built: CSi = {at-robot R1 L2, at-robot R2 L3}, CS 2 = {at-robot R1 L4, at- 
robot R2 L3}. Then we evaluate these sets by building the corresponding LGs 
(Figs. 3 and 4). We choose CS 2 because no conflicts are found whereas in the LG 
corresponding to C<Si a NF2 conflict is generated between the actions that would 
solve the landmarks (at-robot R1 L4) and (at El L3). The executable actions for 
solving each of the landmarks in this set are Go-to R1 LI L4 y Go-to R2 L5 L3. 
Now, the new T (Fig. 4) is {{at— robot R2 L2-\-at— robot R1 L2),at 02 L3, has — 
key K1 Rl, at-robot R1 L2}. Since no more conflicts arise, the selected actions 
are: Get-key Rl L4 Kl, Drop R2 L3 02, Go-to Rl L4 L2. However, actions Get- 
key Rl L4 Kl and Go-to Rl L4 L2 produce a F conflict, and then the latter is 
postponed to the next time step. From this moment on, the process is similar, 
obtaining the following plan: Lv\= Go-to Rl LI L4, Go-to R2 L5 L3, Lr; 2 =Get-key 
Rl L4 Kl, Drop R2 L3 02, Lr; 3 =Go-to Rl L4 L2, Lr; 4 =Push-elevator Rl L2 L3 El, 
Lr; 5 =Climb Rl L3 El, Lr;6=Get-from-secure-box Rl L3 01 El Kl. 
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Fig. 2. Initial situation for the robot problem 




Fig. 3. A part of the LG corresponding to CSi in the robot problem 



5 Experiments 

Tables 1 and 2^ show a comparison between the results obtained with STeLLa 
and other planners. Table 1 shows that, for sequential and logistics domain 
problems, there are only few differences among all the planners. For example, in 
huge and rohoLconnected-tl problems, FF returns plans longer than necessary; in 
logistics, Parastar obtains the optimal solution in time steps, but the number of 
actions is clearly larger than necessary. In probOS and ptial, STeLLa optimizes the 
number of total actions against STAN which optimizes the number of time steps. 
In loga problem, both STeLLa and STAN do not obtain the optimal solution. As 
for problems in the freecell domain, it is remarkable the fact that, although 
STeLLa cannot obtain the optimal solution in 3-1, it clearly outperforms FF and 
STAN in problems from 4~1 on. In miconic, STAN obtains slightly longer plans 
than FF and STeLLa. 

Execution times are not shown because, as we explained in the introduction, 
the main objective of STeLLa is improving plan quality, and therefore, perfor- 
mance is not a main issue. However, it must be said that STeLLa’s performance 
is comparable with STAN’s in most of the problems. On the other hand, the 
performance of both Seristar and Parastar are worse than STeLLa . 

® It has been impossible to execute Seristar and Parastar over Freecell and Miconic 
domains 
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Fig. 4. LG corresponding to CSi in the robot problem 



Table 1. Comparison in sequential and logistics domain problems. Results given in 
number of actions or time steps/actions. 



Problem 


Seristar 


FFv2.2 


Parastar 


STANvS (2000) 


STeLLa 


Blocks_Tw_reverlO 


20 


20 


20/20 


20/20 


20/20 


Blocks_Tw_diff_10 


36 


36 


36/36 


36/36 


36/36 


Blocks_Huge 


18 


24 


18/18 


18/18 


18/18 


Blocks_Huge2 


10 


12 


10/10 


10/10 


10/10 


Robot _connected_t 1 


6 


7 


6/6 


(6/6) 


6/6 


Robot_connected_t2 


6 


6 


6/6 


(6/6) 


6/6 


log_prob02 


15 


15 


9/22 


9/15 


9/15 


log_prob04 


15 


15 


9/20 


9/15 


9/15 


log_prob05 


9 


9 


6/10 


6/10 


8/9 


log_prob06 


- 


25 


11/41 


11/25 


11/25 


log-ptial 


- 


24 


10/44 


(10/25) 


11/24 


logJoga 


- 


52 


- 


(11/54) 


14/53 



6 Conclusions and further work 

We have presented STeLLa, a planner that obtains the optimal solution w.r.t. 
the number of actions allowing parallelism for most of the solved problems. 

The weak point of the algorithm is its high dependency on the LG. Due to the 
information in the LG can be quite incomplete, some difficulties may arise during 
the plan construction. This is the case for those problems for which STeLLa has 
been unable to achieve the optimal solution. In these situations, the LG has 
to be deeply analyzed in order to find further constraints between landmarks. 
Also, other techniques could be used to extract the necessary information. In this 
sense, it would be very helpful to find new criteria to be used for the consistent 
subsets and action selection. Criteria, such as distance to the goal state, might 
reduce the number of action combinations in each Ai, thus improving STeLLa 
performance. 
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Table 2. Comparison in freecell and miconic domains. Results for STAN in brackets 
indicates that the version used is 2000. 



Problem 


FF 


STAN 


STeLLa 


Problem 


FF 


STAN 


STeLLa 


freecell-2-1 


9 


6/9 (9) 


6/9 


mic-s9-0 


31 


(31) 


26/31 


freecell-2-2 


8 


6/8 (8) 


6/8 


mic-sll-0 


37 


(37) 


31/37 


freecell-2-3 


9 


5/9 (8) 


5/9 


mic-sl3-0 


44 


(44) 


36/44 


freecell-2-4 


9 


6/10 (8) 


6/9 


mic-sl5-0 


46 


(47) 


33/46 


freecell-2-5 


9 


6/9 (10) 


6/9 


mic-sl6-0 


53 


(54) 


42/53 


freecell-3-1 


21 


7/15 (16) 


9/16 


mic-sl7-0 


56 


(59) 


44/56 


freecell-3-2 


20 


7/14 (16) 


7/14 


mic-sl8-0 


59 


(59) 


46/59 


freecell-4-1 


26 


(24) 


12/21 


mic-sl9-0 


62 


(63) 


48/62 


freecell-4-2 


26 


(22) 


9/19 


mic-s20-0 


64 


(65) 


48/64 


freecell-5-1 


37 


(34) 


12/27 


mic-s21-0 


70 


(70) 


56/70 


freecell-5-2 


33 


(34) 


14/27 


mic-s22-0 


73 


(74) 


58/73 


freecell-6-1 


43 


(42) 


20/34 


mic-s23-0 


76 


(79) 


60/76 



As a conclusion, and taking into account this is the first version of STeLLa, 
we consider the results are very promising. 
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